Previous convolutional neural networks (CNNs) in the field of semantic segmentation, particularly in road scenes, suffer from overfitting, insensitivity to positional information, and limited robustness due to convolution and pooling operations. In this paper, we propose a multi-scale multi-feature fusion self-attention network (MSMA-Net) based on the U-Net architecture. The Decoder stage of the U-Net structure is removed, retaining only the first four sampling layers of the Encoder. The final features from each layer are simultaneously fed into a multi-scale pyramid pooling structure, where different scale pooling operations merge the features into a unified dimension. The output features are then passed through a Transformer Encoder stage and the MLP Head to produce the final classification results. The proposed method is trained on the Cityscapes and CamVid datasets, with half of the data randomly selected for training. The achieved average intersection over union (mIoU) scores are 77.9% and 72.2% on Cityscapes and CamVid, respectively, demonstrating significant advantages over other networks.
The addition of objects to the original image in splice forgery results in a change in the semantics of the original image, and distribution of these spliced images may bring negative impacts. To solve this issue, many forgery detection methods based on convolutional neural networks are presented. However, they tend to extract deep features, but ignore the importance of shallow semantics. In addition, the complexity of the forgery localization task leads to insufficient accuracy in detecting smaller forged regions. Based on the above problems, an end-to-end image splicing localization network based on multi-scale features and a residual refinement module (RRM) is proposed in this work. Our approach can be roughly divided into two modules: the detection module and the RRM. First, shallow and deep features are extracted from a backbone network, and deep features are then processed by the deeper atrous spatial pyramid pooling (ASPP) module to extract multi-scale features. The deeper ASPP module uses a smaller dilation rate and light convolution, which is more suitable for detecting complex counterfeit images. Second, the shallow features are fused with the multi-scale features to complement the shallow semantic information, such as texture and edges, which further improve the robustness of the model. Finally, the detection network generates coarse prediction maps that are fed to the RRM, and the RRM optimizes these masks by smoothing the boundaries, filling small gaps, and enhancing the edge details, which improves the pixel-level segmentation results for forgery detection. Extensive experimental results on several public datasets show that this method outperforms other state-of-the-art methods in image forgery localization.
Selective encryption algorithms have emerged as a popular technique for protecting the privacy of images during real-time transmission. For selectively encrypted images, it is necessary to evaluate their security and usability with visual security indices, and there have been a series of studies in this area. However, those proposed visual security indices (VSI) are often ineffective. We present a multi-directional structure and content-aware features-based visual security index (MCVSI) to perform an objective assessment on selectively encrypted images. Considering that selectively encrypted images prevent the main contents from being easily identified, stable local features in the images are extracted to indicate the degree of image content leakage. Meanwhile, we extract spatial structure information that closely aligns with human visual perception to indicate the level of variation in the overall image skeleton. Next, these features are subjected to similarity measurements to produce two types of similarity, content perception feature similarity and structure feature similarity. Finally, our visual security index is built by connecting all feature similarities and their corresponding visual security scores using the regression module. The experimental results and analyses indicate that the proposed MCVSI outperforms many existing mainstream VSI in terms of higher performance and stronger robustness, particularly on low and medium quality images.
The purpose of image steganalysis is to detect whether the secret information is hidden in the image. The current advanced adaptive steganography algorithm hides the secret information in the complex area of the image texture. However, it is difficult for the image steganalysis model to capture enough noise residual, which makes the detection ability of the model insufficient. To further improve the detection ability of the spatial image steganalysis model, a U-Net-based auxiliary information generation network is first constructed to increase the size of noise residual in the image so that the model can capture more favorable information. Besides, the spatial and channel attention mechanisms are fused to guide the model to pay more attention to the regions of the image that are globally favorable for steganalysis. To verify the effectiveness of the proposed model, experiments are conducted on the BOSSbase-v1.01 dataset through advanced spatial adaptive steganography algorithms S-UNIWARD and WOW. The experimental results show that the proposed model can improve the detection accuracy by up to 4.5% compared with the current optimal deep learning-based spatial image steganalysis models SR-Net and Siamese-Net.
As an effective method of information hiding, steganography embeds secret information into images in a way that is not perceived by humans. Recent interest in the combination of image steganography and generative adversarial networks (GANs) has yielded rapid progress. However, existing steganography frameworks still suffer from the low quality of the steganographic images and weak resistance to detection by steganalysis algorithms. To overcome these limitations, we propose an effective GAN-based image steganography framework with multiscale features integration. Specifically, we construct the secret image feature extraction network (SfeNet), which is driven by the spatial attention mechanism to extract multiscale features of secret images. And the encoder combined with the efficient channel attention mechanism is presented to embed multiscale features of secret images into the cover image. Subsequently, a steganalyzer is incorporated as the discriminator with the encoder in GAN to strengthen the ability of the model to resist steganalysis. Besides, a mixed loss function is proposed by combining perceptual loss, MS-SSIM, and L1 loss to preserve the structural similarity of images. Experimental results on ImageNet, Pascal VOC2012, and LFW show that the proposed method achieves better quality steganographic images and better resistance to steganalysis compared to some of the state-of-the-art steganography algorithms.
We propose a framework named sampling and whole image denoising network based on generative adversarial network (GAN) for compressed sensing image reconstruction (SWDGAN) to reconstruct natural image from compressed sensing (CS) measurements. This work is devoted to balancing the performance of reconstruction quality, practicability, and running efficiency. Different from the recent deep learning reconstruction networks, we further enhance the feature representation and remove the blocking artifacts by introducing a whole image dense residual denoising module without affecting running efficiency. To improve the flexibility of sampling process and the practicability of our algorithm, a fully connected network without bias is applied in the sampling process, whose weight can be extracted separately and used as a measurement matrix. In this way, the measurements can be obtained by matrix multiplication in multiple running environments, not limited to deep learning framework. Besides, the sampling network can also improve the reconstruction quality even at low sampling ratios. Moreover, we remove batch normalization (BN) layer of reconstruction network to avoid the influence of BN artifact on reconstruction image. The experimental results illustrate that our method outperforms the most advanced traditional methods and deep learning-based methods in terms of reconstruction quality and running time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.