Different from optical images, detection performance in synthetic aperture radar (SAR) images is poor. To solve issues above, an improved ship target detection method is proposed in this paper. Firstly, wavelet decomposition is introduced, the advantages of using wavelet pooling in the down-sampling process of image feature maps were analyzed, replacing the normal pooling with the wavelet pooling in the standard Faster R-CNN and improving the detection performance of large ship targets. Besides, the wavelet decomposition is iterated to obtain the wavelet hierarchical decomposition, forming a wavelet convolutional neural network (WCNN). The obtained feature maps of different scales are fused with the feature maps of the backbone to improve the accuracy of small object detection. Finally, combining the above two methods, the algorithm proposed in this paper is obtained. Through ablation experiments, the target detection performance of different methods was compared, and the results showed that compared to the standard Faster R-CNN algorithm, the algorithm proposed in this paper can improve the mean average precision by 2.2%.
Weight sharing across different locations makes Convolutional Neural Networks (CNNs) space shift invariant, i.e., the weights learned in one location can be applied to recognize objects in other locations. However, weight sharing mechanism has been lacked in Rotated Pattern Recognition (RPR) tasks, and CNNs have to learn training samples in different orientations by rote. As such rote-learning strategy has greatly increased the difficulty of training, a new solution for RPR tasks, Pre-Rotation Only At Inference time (PROAI), is proposed to provide CNNs with rotation invariance. The core idea of PROAI is to share CNN weights across multiple rotated versions of the test sample. At the training time, a CNN was trained with samples only in one angle; at the inference-time, test samples were pre-rotated at different angles and then fed into the CNN to calculate classification confidences; at the end both the category and the orientation were predicted using the position of the max value of these confidences. By adopting PROAI, the recognition ability learned at one orientation can be generalized to patterns at any other orientation, and both the number of parameters and the training time of CNN in RPR tasks can be greatly reduced. Experiments show that PROAI enables CNNs with less parameters and training time to achieve state-of-the-art classification and orientation performance on both rotated MNIST and rotated Fashion MNIST datasets.
Rotated target recognition is a challenge for Convolutional Neural Networks (CNN), and the current solution is to make CNN rotational invariant through data augmentation. However, data augmentation makes CNN easy to overfit small scale sonar image datasets, and increases its numbers of parameters and training time. This paper proposes to recognize rotated targets of sonar images using a novel CNN with Rotated Inputs (RICNN), which doesn’t need data augmentation. During training, RICNN was trained with sonar images of targets only at one orientation, which avoid it to learn multiple rotated versions of the same targets, and reduces both number of parameters and training time of CNN. During testing, RICNN calculated classification scores for each test image and its all-possible rotated versions. The max of these classification scores were used to simultaneously estimate the category and orientation of each target. Besides, to improve the generalization of RICNN on imbalanced sonar datasets, this paper also designs an imbalanced data sampler. Experiments on a self-made small, imbalanced sonar image rotated target recognition dataset show that the improved RICNN achieves 4.25% higher classification accuracy than data augmentation, and reduces the number of parameters and training time to 2.25% and 19.2% of that of data augmentation method. Moreover, RICNN achieves comparable orientation estimation accuracy with a CNN orientation regressor trained with data augmentation. Codes, dataset are publicly available.
The effectiveness of CycleGAN is demonstrated to outperform recent approaches for semi-supervised semantic segmentation on public segmentation benchmarks for a small number of the labelled data. However CycleGAN tends to generate same semantic segmentation results for acoustic image datasets, and can’t retain target details. To solve this problem, an spectral normalized CycleGAN network (SNCycleGAN) is presented, which applies spectral normalization to both generators and discriminators to stabilize the training of GANs. The experimental results demonstrate that semi-supervised training of SNCycleGAN helps to achieve reasonably accurate sonar targets segmentation from limited labelled data without using transfer learning, and surpass supervised training in detail preservation.
Convolutional neural network (CNN) has achieved good performance in object classification due to its inherent translation equivariance, but its ability of scale equivariance is poor. A Scale-Aware Network (SA Net) with scale equivariance is proposed, which can estimate scale, that is, the size of image, while classifying. In the training stage, only one scale pattern is learned. In the testing stage, firstly, the testing sample with unseen scale is zoomed-in and zoomed-out into a set of images with different scales, which form an image pyramid. The image zooming-in channels are up-sampled by bilinear interpolation. The image zooming-out channels are down-sampled, and the combination of dyadic discrete wavelet transform (DWT) and bilinear interpolation are used to avoid spectrum aliasing. Then, the image pyramid with different scales is sent to siamese CNNs with weight-sharing for inferencing. A two-dimensional classification score matrix is obtained. Through the position of the maximum of the classification score matrix, the classification and scale estimation can be carried out at the same time. Experiments are carried out on MNIST Large Scale testing set. In scale estimation experiments, the relative value of root mean square error (RMSE) can be obtained by scaling the testing sample images in a geometric series with common ration of 4√2 in the range of [1/2,2]. The classification experiments show that when the scale is greater than 1.0, the classification accuracy can surpass 90%. SA Net can estimate the scale while improving the classification accuracy, and mis-estimated samples are always near the ground-truths (GTs), so the correct scale of the unseen scale can always be obtained roughly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.