Accurate segmentation of buildings in remote sensing images is crucial for various applications, like urban planning, disaster management, and environmental monitoring. Traditional methods often struggle to handle the complexity of building structures and appearances. In this work, we utilize a multi-level multiple attention-based approach in the DeepLabv3+ model for obtaining global context and local information through the dual attention mechanism and convolutional block attention module. Rather than deploying superficial convolution layers, EfficientNetB7 is used as an encoder. Dual attention comprising of position attention module and channel attention module are added to the output of atrous spatial pyramid pooling model. This is done to obtain the inter-relationship between spatial and channel dimensions. The position attention module obtains the interdependencies of similar features irrespective of their distances through a weighted sum of the features at all positions in the image. Whereas channel attention focuses on improvising correlated channel information by incorporating relevant features across all channel maps. Also, convolutional block attention module is incorporated for better representation of low-level features which is added to the top of the pre-trained residual network backbone. The result of the two attention modules provides better segmentation results. The proposed model was executed on a building dataset, namely Massachusetts Building Dataset. The experimental results demonstrate the improved performance of the proposed model by increasing the mIoU by 0.47% on the dataset, respectively as compared to current state-of-the-art models.
|