This paper describes a deep learning approach to semantic segmentation of very high resolution remote sensing images. We introduce RLFCN, a fully convolutional architecture based on residual logic blocks, to model the ambiguous mapping between remote sensing images and classification maps. In order to recover the output resolution to the original size, we adopt a special way to efficiently learn feature map up-sampling within the network. For optimization, we employ the equally-weighted focal loss which is particularly suitable for the task for it reduces the impact of class imbalance. Our framework consists of only one single architecture which is trained end-to-end and doesn't rely on any post-processing techniques and needs no extra data except images. Based on our framework, we conducted experiments on a ISPRS dataset: Vaihingen. The results indicate that our framework achieves better performance than the current state of the art, while containing fewer parameters and requires fewer training data.
We propose Constrained Convolutional Neural Network, a novel approach to estimate the direction of numerous target objects. Considering adding a constrained layer at the output of existing object detection networks, by which CCNN performs better in both accuracy and speed than previous neural networks as it works with filtered data, and obtains a more precise result. In object direction estimation, by means of constraint structures, forward and backward propagation algorithms redesigned for the quaternions which describe the 3D pose of the object, CCNN can be further applied to 3D pose estimation. Experiments show that CCNN is feasible for object direction detection and 3D pose estimation, and outperforms conventional neural networks without unitized constrained layer.
This paper proposes a new irregular remote sensing object detection algorithm that different from the ROI or rotating BOX obtained by traditional one. The architecture is designed to jointly learn four bounding box corner points and their association via two branches of the same sequential prediction process. The algorithm predicts four key points of the object and their associated connection, Bounding Box Fields(BBF) via convolutional neural network(CNN), and thus obtains the detail spatial distribution of the objects.
In order to improve the positioning accuracy of the key points, network architecture reduced Receptive Field from large to small stage by stage. It has achieved ROI free finally. In this method, the object detection problem is framed as CNN convolution point detection and bounding box field detection, it achieved the one stage object detection with high precision and high speed.
We verified the effectiveness and efficiency of the algorithm through experiments, which proved that the new data structure could locate the object attitude and spatial direction more accurately in real time with strong practicability.
This paper proposes Object-based Loss Function in Segmented Neural Networks. Traditional Segmented Neural Network(SNN) are based on Pixel-based Back Propagation(PBP). Since the pixel ratios of the images occupied by different sizes of objects are not the same, the weight of the small objects in the segmentation is small, which means using PBP may greatly affects the accuracy of the detection when there are a large number of small objects. Considering this defect of PBP, we propose a Object-based Back Propagation(OBP) loss function weight design, that is, the back propagation weights of different objects are not equal, which is inversely proportional to the area occupied by the object. Segmented Neural Networks data set test.
We propose a fast and efficient method for pedestrian video segmentation. Previous methods can only use the first frame or the previous frame or a combination of the two, but in our framework, all past frames can be used by using memory network. The past frames with corresponding masks form the memory, and the current frame as the target will be segmented using the information from the memory instead of itself for only. The solution can better handle the problems such as movement and appearance changes in the video. ResUnet is used as the segmentation network to improve time efficiency. Since no dataset is publicly available yet for pedestrian video segmentation, we have internally labeled a large dataset which contains 216 sequences in the training set and 24 sequences in the test set and it will be made public in the future. We validate our method on the test set and achieved the mean IU of 92.6 which is better than using previous methods while keeping real-time(90FPS for input of 160*96 on a TITAN V).
With the development of remote sensing technology, we can obtain more and more target information from remote sensing images. Among them, the 6D pose contains the position and attitude of the target relative to the camera in the three-dimensional coordinate system. The traditional 6d pose algorithm for predicting targets is calculated by predicting the target RoI or inclined box. However, the detection standard IoU of the traditional method cannot reflect the direction information of the target, and there is ambiguity of the inclination of the target inclined box, such as 0°and 180°, 0° and 360°. In this paper, we present a new algorithm for predicting the target's 6D pose in remote sensing images, Anchor Points Prediction (APP). Different from the previous methods, the target results of the final output can get the direction information. Different from the traditional method, we predict the target's multiple feature points based on the neural network to obtain the homograph between the object plane and the ground. The resulting 6d pose can accurately describe the three-dimensional position and attitude of the target. We tested our algorithm on the HRSC2016 dataset and the DOTA dataset with accuracy rates of 0.863 and 0.701, respectively. The experimental results show that the accuracy of the APP algorithm detection target is significantly improved. At the same time, the algorithm can achieve one stage prediction, which makes the calculation process easier and more efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.