|
1.INTRODUCTIONThe presence of metal grafts carried by patients can result in the generation of radiological artifacts in computed tomography (CT) images,1 which can subsequently impede the accuracy of subsequent medical diagnoses.2 Consequently, the field of medical image analysis has witnessed a surge in interest in the area of metal artifact reduction in recent years. In recent times, a plethora of deep learning-based MAR algorithms have been proposed. These methods can be broadly classified into two categories: image-domain-only methods and sinogram-involved methods. The sinogram-involved methods can be roughly divided into two schemes, methods that utilize solely sinogram data2–4 and methods that employ both sinogram data and image data.5–8 However, slight disturbances in the sinogram map can lead to serious secondary artifacts in the image domain.1 Furthermore, the collection of sufficient sinogram information in clinical procedures is challenging9 (Fig. 1(a)). Image-domain-only processing methods1, 9–11 employ noise reduction directly on the artifact-affected CT images without sinogram data. However, these methods utilize a single artifact-affected image during the training process, lacking sufficient reference for reconstruction.(Fig. 1(a)). To address these issues, we propose a novel Discrete Clinical Convergence Generation Network (DCCGN) for MAR(Fig. 1(b)). The DCCGN eliminates the necessity for sinogram domain data and employs a novel quantitative methodology to introduce clinically pristine CT prior information, relying exclusively on image domain data for the generation of high-quality CT reconstructions. Specifically, the DCCGN first pre-trains a VQGAN model using clinical clean CT data and retains its codebook and decoder as the Pre-trained Clinical Module(PCM) of the model. To enhance the robustness of the feature matching process in the VQGAN, we substitute the nearest-neighbor matching part with a Robust Transformation Module(RTM) based on the encoder structure of the Transformer. Finally, to reduce the influence of the feature conversion process on the original image information, we introduce a Fidelity Guarantee Module(FGM) to gradually fuse the degraded image features with the pre-trained decoder features. This improves the consistency of the model before and after the feature conversion process. Our main contributions are summarised as follows: I. DCCGN achieves significant CT reconstruction using only image domain data. II. DCCGN introduces artifact-free CT data prior information in a novel way in the metal artifact removal process. III. DCCGN proposes a novel feature residual module to ensure the fidelity of reconstruction. 2.METHODThe proposed framework for generating discrete clinical aggregations, DCCGN (Fig. 2), produces a high-fidelity reconstruction that incorporates prior clinical information. It consists of three main modules which are combined through a recursive training process: the Pre-trained Clinical Module(PCM), the Robust Transformation Module(RTM), and the Fidelity Guarantee Module(FGM). 2.1Pre-trained Clinical Module(PCM)PCM allows the reconstruction process to converge to a clinically clean CT. The training process of the PCM module involves only clinically clean CT and the parameters of the module are kept frozen in the subsequent stages. As shown in Fig.2(a), clinical clean CT data Ic ∈ ℝH×W×1 are first embedded by an encoder E to obtain their corresponding compression features xc ∈ ℝm×n×1. According to the VQGAN,12 each pixel point of xc ∈ ℝm×n×1 is nearest-neighbour computed with the feature item ck ∈ ℝd in the codebook C to obtain its code sequence S ∈ {0,1,…,n−1}m×n. Based on the sequence S, the features xc are re-extracted from the codebook to form a new quantized feature Xcb ∈ ℝm×n×d, i.e., when Si×n+j = k. The decoder reconstructs this clean CT data using Xcb as input. We adopt three image-level loss between the input Ic and the reconstruction result Irec: L1 loss L1, perceptual loss Lper and adversarial loss Ladv. In order to better optimise the codebook, we also adopt the code-level loss Lcode: where sg is the stop-gradient operator like VQGAN,12 and β = 0.25 is a weight trade-off. The complete optimisation objective for this stage are as follows: where αl1 and αadv is set to 1.5 and 0.3, respectively. After adequate training, the frozen codebook and decoder comprise the PCM to participate in the reconstruction process. 2.2Robust Transformation Module(RTM)RTM makes intermediate feature conversion more robust. The nearest neighbour algorithm would lead to mismatching of features due to metal artifacts on CT image pixels. We refer to the idea of CodeFormer13 to replace the nearest-neighbor algorithm of the VQGAN with the RTM module to match features in a more robust way. Based on the pre-trained VQGAN in Sec.2.1, as shown in Fig.2(b), we accessed a Transformer-encoder14 Etr containing nine self-attention blocks after the encoder E and added an extra linear layer for generating code sequences to form the RTM, as shown in Fig.3(b). Specifically, a CT image containing metal artifact is processed by encoder Em (finetuned from E) to generate the corresponding artifact-affected feature xm. The RTM takes xm as input and converts get the code sequence Sm(like S in Sec.2.1). The frozen PCM takes Sm as input to generate the quantized clean CT feature Xmcb and reconstructed clean CT image. We only use the code-level loss to train RTM: cross-entropy loss and L2 loss : where the ground truth of code sequence S and feature Xcb is obtained from the pre-trained VQGAN in Sec.2.1. While training the RTM, E will be fine-tuned to get Em, the complete loss of this stage is as follows: where λ is set to 0.5. 2.3Fidelity Guarantee Module(FGM)FGM ensures the fidelity of the result by fusing the encoder stage features with the decoder features in a scaled manner, as shown in Fig.2(c). RTM replaces artifact-affected features directly by matching them with the codebook entries, and the whole process is discrete and a brand new feature map is obtained, so it is difficult to ensure the fidelity of the final result. Therefore, we make a new design according to the actual situation of metal artifact reduction, as shown in Fig.3(a). Specifically, we feed the intermediate stage features of the encoder to two convolution stacks f1, f2 with the same structure to obtain xf1, xf2. According to cGAN,15 we reshape the corresponding stage features of the decoder by α and β: We add the FGM at the stage where the encoder and decoder generate features of size s ∈ {32, 64, 128, 256}, respectively. To ensure that enough coded feature information is involved in the reconstruction process and no additional noise is introduced, we set ω to 0.5. To train the FGM and fine-tune the Em and RTM in Sec.2.2, we continue to use the metal-affected data as input and combine the loss LPCM and LRTM as the complete loss of this stage. The percentage of each loss function remains constant. 3.EXPERIMENTS3.1Experimental configurationDataset and pre-processing.We used the publicly available DeepLesion17 dataset to simulate metal artifacts in CT images following the simulation protocol from Yu et al.7 The 100 metal masks from Zhang et al.3 were divided into 90 and 10 for training and testing, respectively. We randomly select 130,000 CT slices from the DeepLesion dataset as clinical CT samples for the stage(a), and extract 1,500 CT slices from them to synthesise metal-affected samples with 90 metal masks for the stage(b), stage(c). For testing, we additionally randomly extract 200 CT slices and use the 10 test metal masks to synthesise the test dataset following the same process. Implementation details and Evaluation metrics.The input image size is fixed at 256 × 256 and the codebook size is set to 1200, each code item’s size is 256. For all modules of training, we use the Adam18 optimizer with a batch size of 10. We set the learning rate to 1 × 10−4 for PCM training, and set 1 × 10−4 and 5 × 10−5 for RTM and FGM, respectively. We set the iterations for PCM, RTM and FGM to 1.6M, 1M and 800K, respectively. The peak signal-to-noise ratio(PSNR) and the structured similarity index(SSIM) were adopted for the MAR evaluation. 3.2Experimental resultsTo validate the effectiveness of DCCGN in the field of image-only MAR, we compared it with one conventional MAR methods16 and several representative state-of-the-art (SOTA) image-only methods3, 9–11 on synthetic and clinical datasets, respectively. Comparison with SOTAs on synthetic datasets.As illustrated in Tab.1, the DCCGN outperformed the state-of-the-art (SOTA) methods with a high PSNR and SSIM scores for image-domain-only MAR in synthetic datasets. Qualitative comparisons, as shown in Fig.4, indicate that the DCCGN is more effective than other methods in reproducing the actual appearance and structural details and in circumventing the problem of noise residuals. Table 1Quantitative comparisons (PSNR/SSIM) demonstrate the superior performance of DCCGN compared to SOTA image-domain-only methods.
Comparison with SOTAs on clinical datasets.The publicly available CLINIC-metal19 was collected for clinical testing. Due to the lack of ground truth, only qualitative analyses were performed. As shown in 5, the proposed DCCGN was found to remove artifacts from the images more efficiently and recover details more accurately, which undoubtedly proves the clinical validity of the approach. 4.CONCLUSIONIn this study, we propose a novel Discrete Clinical Convergence Generation Network(DCCGN) that quantitatively incorporates clinical clean CT data into the reconstruction process via the pre-trained PCM module. The robustness and fidelity of the reconstruction are improved using RTM and FGM. The whole process uses image domain data with high quality, which overcomes the limitations of sinogram domain information. The experimental results demonstrate the effectiveness of our method in the MAR task. ACKNOWLEDGMENTSThis work was supported by the National Natural Science Foundation of China under Grant 62272135, Grant 62372135 and Grant 62202092. REFERENCESLiao, H., Lin, W.-A., Zhou, S. K., and Luo, J.,
“Adn: artifact disentanglement network for unsupervised metal artifact reduction,”
IEEE Transactions on Medical Imaging, 39
(3), 634
–643
(2019). https://doi.org/10.1109/TMI.42 Google Scholar
Wellenberg, R., Hakvoort, E., Slump, C., Boomsma, M., Maas, M., and Streekstra, G.,
“Metal artifact reduction techniques in musculoskeletal ct-imaging,”
European journal of radiology, 107 60
–69
(2018). https://doi.org/10.1016/j.ejrad.2018.08.010 Google Scholar
Zhang, Y. and Yu, H.,
“Convolutional neural network based metal artifact reduction in x-ray computed tomography,”
IEEE transactions on medical imaging, 37
(6), 1370
–1381
(2018). https://doi.org/10.1109/TMI.2018.2823083 Google Scholar
Ghani, M. U. and Karl, W. C.,
“Fast enhanced ct metal artifact reduction using data domain deep learning,”
IEEE Transactions on Computational Imaging, 6 181
–193
(2019). https://doi.org/10.1109/TCI.2019.2937221 Google Scholar
Lin, W.-A., Liao, H., Peng, C., Sun, X., Zhang, J., Luo, J., Chellappa, R., and Zhou, S. K.,
“Dudonet: Dual domain network for ct metal artifact reduction,”
(2019). https://doi.org/10.1109/CVPR41558.2019 Google Scholar
Lyu, Y., Lin, W.-A., Liao, H., Lu, J., and Zhou, S. K.,
“Encoding metal mask projection for metal artifact reduction in computed tomography,”
in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23,
147
–157
(2020). Google Scholar
Yu, L., Zhang, Z., Li, X., and Xing, L.,
“Deep sinogram completion with image prior for metal artifact reduction in ct images,”
IEEE transactions on medical imaging, 40
(1), 228
–238
(2020). https://doi.org/10.1109/TMI.42 Google Scholar
Wang, H., Li, Y., Zhang, H., Chen, J., Ma, K., Meng, D., and Zheng, Y., Indudonet: An interpretable dual domain network for ct metal artifact reduction,
(2021). Google Scholar
Wang, H., Xie, Q., Zeng, D., Ma, J., Meng, D., and Zheng, Y.,
“Oscnet: Orientation-shared convolutional network for ct metal artifact learning,”
IEEE Transactions on Medical Imaging,
(2023). Google Scholar
Wang, J., Zhao, Y., Noble, J. H., and Dawant, B. M.,
“Conditional generative adversarial networks for metal artifact reduction in ct images of the ear,”
in Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I,
3
–11
(2018). Google Scholar
Wang, H., Li, Y., He, N., Ma, K., Meng, D., and Zheng, Y.,
“Dicdnet: deep interpretable convolutional dictionary network for metal artifact reduction in ct images,”
IEEE Transactions on Medical Imaging, 41
(4), 869
–880
(2021). https://doi.org/10.1109/TMI.2021.3127074 Google Scholar
Esser, P., Rombach, R., and Ommer, B.,
in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
12873
–12883
(2021). Google Scholar
Zhou, S., Chan, K., Li, C., and Loy, C. C.,
“Towards robust blind face restoration with codebook lookup transformer,”
Advances in Neural Information Processing Systems, 35 30599
–30611
(2022). Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.,
“Attention is all you need,”
Advances in neural information processing systems, 30
(2017). Google Scholar
Wang, X., Yu, K., Dong, C., and Loy, C. C.,
in Proceedings of the IEEE conference on computer vision and pattern recognition,
606
–615
(2018). Google Scholar
Kalender, W. A., Hebel, R., and Ebersberger, J.,
“Reduction of ct artifacts caused by metallic implants,”
Radiology, 164
(2), 576
–577
(1987). https://doi.org/10.1148/radiology.164.2.3602406 Google Scholar
Yan, K., Wang, X., Lu, L., and Summers, R. M.,
“Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning,”
Journal of medical imaging, 5
(3), 036501
–036501
(2018). https://doi.org/10.1117/1.JMI.5.3.036501 Google Scholar
Kingma, D. P. and Ba, J.,
“Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980,
(2014). Google Scholar
Liu, P., Han, H., Du, Y., Zhu, H., Li, Y., Gu, F., Xiao, H., Li, J., Zhao, C., Xiao, L., et al,
“Deep learning to segment pelvic bones: large-scale ct datasets and baseline models,”
International Journal of Computer Assisted Radiology and Surgery, 16 749
–756
(2021). https://doi.org/10.1007/s11548-021-02363-8 Google Scholar
|