|
1.INTRODUCTIONLow back pain (LBP) has become a hackneyed spinal surgery question in today’s era of overtime. According to a survey1, approximately 70% to 85% of adults are affected by lumbar spine disorders at some point in their lives. In China, lumbar disc herniation (LDH) patients account for the majority of patients with lumbar disease. Conventional medical image processing still relies on the man-made fetching method, but with the explosive increase of medical view data, the disadvantages of the artificial reading method are steadily revealed. Computer-aided diagnosis (CAD) means comprehensive imaging, medical image processing technology and other potential biological means, through computer analysis and detection, the lesions can be found and the diagnostic accuracy can be improved. The first attempt of CAD system was in the 1960s2. Along with evolution of computer vision and picture processing techniques, the use of deep learning methods to process medical images and assist physicians in clinical diagnosis has become a popular means3. Many studies have emerged to analyze medical images more accurately, some of which focus on specific deformations or injuries in spine images4, 5, others aim at automatically detect vertebrae6, 7. Zhao et al.8 performed detection of spine MRI based on the category consistent self-calibration detection framework; Han et al.9 applied generative adversarial network (GAN) to semantic segmentation of the spine; Wang et al.10 realized automatic vertebral localization and identification in CT by training a key point localization model and introducing an anatomical constraint optimization module; Zhang et al.11 raised a serial conditional strengthening learning network that innovatively models the top-down spatial correlation between vertebrae as a continuous dynamic interaction process, and thus conducting global focused detection and segmentation of each vertebra. 2.OUR APPROACH2.1Structure of networkAt present, the more advanced one-stage object detection network is the FCOS network proposed by Tian et al.12, it has a faster detection speed than the two-stage detection network and a similar detection accuracy, which is the basis of this paper. FCOS network achieves proposal-free and anchor-free by many clever designs (FPN, center-ness, scale limitation of feature point regression in each layer, etc.). At the same time, it avoids complex IOU calculation and the matching between anchor and GT Bounding box during training. The network architecture of this study can be seen in Figure 1, which adopts feature pyramid Network (FPN) and three-branch head detection network. 2.2Self-calibrated convolutionsIn order to further optimize the network parameters on the foundation of the FCOS network and improve network accuracy, this paper draws on the SC-Net suggested by Liu et al.13. The network mainly uses a self-calibrating convolutional SCConv, whose structure is shown in Figure 2. The design of SCConv is simple and versatile, and can easily enhance the performance of standard convolution layers without introducing additional parameters and complexity. The architecture of the SC module is shown in Figure 3. 2.3Squeeze-and-excitation blocksTo better exploit the dynamic relationship between feature channels, this paper introduces the SE-Net proposed by Hu et al.14. SE-Net is very simple in construction and apt to instruct, there is no need to introduce new functions or layers, with good performance in the aspect of parameter complexity and network structure. The network structure is shown in Figure 4. Figure 5 reveals the structure of embedding the SE module into the Res-Net module. In this paper, we use SE-ResNet to focus on the channel relationship of the network, Squeeze operation to establish the dependency relationship between channels, and Excitation operation to recalibrate the features. The combination of the two emphasizes the useful features and suppresses the useless features, which can effectively improve the model performance and increase the accuracy rate. 2.4Soft-NMSIn our study, we use Soft-NMS15, which mainly solves the problem of excessive deletion of boxes by NMS, Soft-NMS has learned the lessons of NMS, during the execution of the algorithm, instead of simply deleting the detection box whose IOU is above the threshold, lower its score. The algorithm process is the same as NMS, but the function operation is used for the original confidence score, and the goal is to reduce the confidence score. Soft-NMS is expressed as follows. 2.5.Improved res moduleBased on the improvement of the traditional Res-Net network, the SC module and SE module are introduced to modify the residual module in the traditional Res-Net network. The improved module architecture is shown in Figure 6. 2.6Loss functionThree parts constitute the loss function of this network, Focal loss for classification loss, IOU loss for regression loss and center-ness loss for BCE. The SoftMax is discarded in the classification loss, and the sigmoid function is used for each channel (each channel represents a category) of the classification output by the head, and then the Focal loss is used. IOU Loss only performs regression calculation for those meaningful feature points. The specific loss function is as below. 3.EXPERIMENTS3.1Implementation detailsThe experimental environment of this paper is 20.04.3-Ubuntu system, the processor model is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, the graphics card model is Nvidia Quadro M4000 8GB, the python version used is 3.6.13. We use pytorch to build the deep learning framework. In this paper, we use a non-public human lumbar disc MRI-T2 image dataset collected from the Internet with 470 images, which is divided into train dataset (376 images), val dataset (47 images) and test dataset (47 images) by 8:1:1. The target categories in the dataset are divided into normal and herniated. The training process is optimized using stochastic gradient descent (SGD) with an initial learning rate of 0.005 and momentum of 0.9. 3.2Results on MRIThe experiment was carried out according to the five-fold cross-validation, take the mean value of five experimental results as the final experimental result. Partial detection results of the model on MRI dataset are shown in Figure 7. Laboratory result of some model on MRI test set are listed in Table 1. The network model proposed in this paper has few parameters, and its detection speed is faster and the detection accuracy is the highest among all models. Compared with the generic Faster R-CNN and Retina-Net, it has a large performance improvement, which fully reflects the practical application value of the model. Moreover, our network has no complex network structure and has great advantages in the deployment of the network model. Table 1.Comparisons on spine MRI dataset.
3.3Ablation studyFor verifying the availability of the modules introduced in our network, we conducted ablation experiments on the main modules to check on their role in the network. The results of ablation experiments are shown in Table 2. We can see that the module we introduced improves the performance of the network. Table 2.Ablation study on spine MRI dataset.
Furthermore, we performed an ablation experiment on the SE modules to evaluate the effect of their alignment position when integrating them into existing frameworks. Figure 8 shows the structure of these variants, and Table 3 shows the property of these variants. As can be seen from the laboratory results, the performance of SE-PRE block, SE-Identity block, and Standard SE block is similar, while the use of the SE-POST block results in a degradation in capability. The test shows that the property improvement brought by the SE block is robust to its position change if it is used before branch aggregation. Table 3.Effect of different SE block.
4.CONCLUSIONS AND FUTURE WORKWe propose an anchor-free and proposal-free one-stage detector that incorporates the SCConv module and the SE attention mechanism, the network property is greatly improved. Laboratory results on spine MRI show that it outperforms currently fashionable anchor-based single-stage detectors, comprises Retina-Net, YOLO, and SSD, and the design complexity is much lower. Not only that, we also envision to combine the FCOS network with Faster R-CNN, in the cause of further enhance the detection speed of our network without increasing additional parameters and operations to satisfy the demands of doctors in the aspect of detection speed and design a more perfect clinical auxiliary diagnosis system. REFERENCESOhtori, S., Inoue, G., Orita, S., et al,
“No acceleration of intervertebral disc degeneration after a single injection of bupivacaine in young age group with follow-up of 5 years,”
Asian Spine Journal, 7
(3), 212
(2013). https://doi.org/10.4184/asj.2013.7.3.212 Google Scholar
Lodwick, G. S., Keats, T. E. and Dorst, J. P.,
“The coding of roentgen images for computer analysis as applied to lung cancer,”
Radiology, 81
(2), 185
–200
(1963). https://doi.org/10.1148/81.2.185 Google Scholar
Zhang, X.,
“New concept of the development of modern medicine: Make full use of the internet, large data, and artificial intelligence,”
Chinese Journal of Lung Cancer, 21
(3), 141
–2
(2018). Google Scholar
Anitha, H. and Prabhu, G. K.,
“Identification of apical vertebra for grading of idiopathic scoliosis using image processing,”
Journal of Digital Imaging1, 25 155
–61
(2012). https://doi.org/10.1007/s10278-011-9394-x Google Scholar
Kumar, S., Nayak, K. P. and Hareesha, K. S.,
“Improving visibility of stereo-radiographic spine reconstruction with geometric inferences,”
Journal of Digital Imaging, 29
(2), 226
–34
(2016). https://doi.org/10.1007/s10278-015-9841-1 Google Scholar
Kumar, V. P. D. and Thomas, T.,
“Automatic estimation of orientation and position of spine in digitized X-rays using mathematical morphology,”
Journal of Digital Imaging, 18
(3), 234
–41
(2005). https://doi.org/10.1007/s10278-005-5150-4 Google Scholar
Benjelloun, M. and Mahmoudi, S.,
“Spine localization in X-ray images using interest point detection,”
Journal of Digital Imaging, 22
(3), 309
–18
(2009). https://doi.org/10.1007/s10278-007-9099-3 Google Scholar
Zhao, S., Wu, X., Chen, B., et al,
“Automatic vertebrae recognition from arbitrary spine MRI images by a category—Consistent self-calibration detection framework,”
Medical Image Analysis, 67 101826
(2021). https://doi.org/10.1016/j.media.2020.101826 Google Scholar
Han, Z., Wei, B., Mercado, A., et al,
“Spine-GAN: Semantic segmentation of multiple spinal structures,”
Medical Image Analysis, 50 23
–35
(2018). https://doi.org/10.1016/j.media.2018.08.005 Google Scholar
Wang, F., Zheng, K., Lu, L., et al,
“Automatic vertebra localization and identification in CT by spine rectification and anatomically-constrained optimization,”
in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition,
5280
–8
(2021). Google Scholar
Zhang, D., Chen, B. and Li, S.,
“Sequential conditional reinforcement learning for simultaneous vertebral body detection and segmentation with modeling the spine anatomy,”
Medical Image Analysis, 67 101861
(2021). https://doi.org/10.1016/j.media.2020.101861 Google Scholar
Tian, Z., Shen, C., Chen, H., et al,
“Fcos: Fully convolutional one-stage object detection,”
in Proc. of the IEEE/CVF Inter. Conf. on Computer Vision,
–36
(2019). Google Scholar
Liu, J., Hou, Q., Cheng, M., et al,
“Improving convolutional networks with self-calibrated convolutions,”
in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition,
10096
–105
(2020). Google Scholar
Hu, J., Shen, L. and Sun, G.,
“Squeeze-and-excitation networks,”
in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition,
7132
–41
(2018). Google Scholar
Bodla, N., Singh, B., Chellappa, R., et al,
“Soft-NMS--improving object detection with one line of code,”
in Proc. of the IEEE Inter. Conf. on Computer Vision,
5561
–9
(2017). Google Scholar
Redmon, J. AND Farhadi, A.,
“YOLO9000: Better, faster, stronger,”
in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition,
7263
–71
(2017). Google Scholar
|