In terms of computer vision, image classification plays a crucial role. With the progress in deep learning technology, there has been a significant improvement in image classification performance. Due to their capability of self-learning, deep learning models are applicable to automatically extract features from a vast amount of data and effectively address the complexity in image classification by navigating the nonlinear relationship inherent in high-dimensional data. In this paper, a novel approach to image classification is presented on the basis of an enhanced VGG network, with the GELU activation function incorporated into the VGG network to address gradient vanishing for deep learning. Furthermore, the Efficient Multi-scale Attention Module (EMA) is introduced for the network to focus more on the retention of information in each channel. This is achieved by dividing the channels into subgroups and balancing the spatial semantic features within the feature groups. Through cross-dimensional interaction, this network can be applied to encode the global information and enhance the features. Finally, an experiment is conducted on the CIFAR-100 classification dataset to demonstrate the superiority of the proposed method in image classification.
|