Open Access Paper
11 September 2024 Conversion-based reconstruction: a discretized clinical convergence generative network for CT metal artifact reduction
Mingye Zou, Xinghua Ma, Wei Wang, Kuanquan Wang
Author Affiliations +
Proceedings Volume 13270, International Conference on Future of Medicine and Biological Information Engineering (MBIE 2024); 1327005 (2024) https://doi.org/10.1117/12.3039971
Event: 2024 International Conference on Future of Medicine and Biological Information Engineering (MBIE 2024), 2024, Shenyang, China
Abstract
In computed tomography (CT) systems, the presence of metal artifacts can significantly impair image quality, making subsequent diagnosis and treatment more challenging. Against the Metal Artifact Reduction(MAR) task, existing deep-learning methods have achieved satisfactory reconstruction results. However, most of the methods involve limited artifact-free prior knowledge to guarantee the fidelity of the final result. Additionally, the sinogram-domain information that many methods rely on introduces secondary noise, exacerbating the negative effects. To address these issues, we propose a novel Discretized Clinical Convergence Generative Network (DCCGN) that relies on image domain information and robustly introduces the clinical prior information in a quantitative manner to convert noisy features into clean ones to complete the reconstruction. Extensive experiments and evaluations have shown that DCCGN has superior generation and fidelity compared to several SOTA algorithms for both synthetic and clinical datasets.

1.

INTRODUCTION

The presence of metal grafts carried by patients can result in the generation of radiological artifacts in computed tomography (CT) images,1 which can subsequently impede the accuracy of subsequent medical diagnoses.2 Consequently, the field of medical image analysis has witnessed a surge in interest in the area of metal artifact reduction in recent years.

In recent times, a plethora of deep learning-based MAR algorithms have been proposed. These methods can be broadly classified into two categories: image-domain-only methods and sinogram-involved methods. The sinogram-involved methods can be roughly divided into two schemes, methods that utilize solely sinogram data24 and methods that employ both sinogram data and image data.58 However, slight disturbances in the sinogram map can lead to serious secondary artifacts in the image domain.1 Furthermore, the collection of sufficient sinogram information in clinical procedures is challenging9 (Fig. 1(a)). Image-domain-only processing methods1, 911 employ noise reduction directly on the artifact-affected CT images without sinogram data. However, these methods utilize a single artifact-affected image during the training process, lacking sufficient reference for reconstruction.(Fig. 1(a)).

Figure 1.

(a) The sinogram involved methods are affected by secondary artifacts and data sources are hard to harvest, and the image-only methods’ process only involves artifact-affected CT data.(b) DCCGN uses only image domain data and introduces clinical prior information in a discrete manner, transforming degraded features to complete the reconstruction.

00165_PSISDG13270_1327005_page_2_1.jpg

To address these issues, we propose a novel Discrete Clinical Convergence Generation Network (DCCGN) for MAR(Fig. 1(b)). The DCCGN eliminates the necessity for sinogram domain data and employs a novel quantitative methodology to introduce clinically pristine CT prior information, relying exclusively on image domain data for the generation of high-quality CT reconstructions. Specifically, the DCCGN first pre-trains a VQGAN model using clinical clean CT data and retains its codebook and decoder as the Pre-trained Clinical Module(PCM) of the model. To enhance the robustness of the feature matching process in the VQGAN, we substitute the nearest-neighbor matching part with a Robust Transformation Module(RTM) based on the encoder structure of the Transformer. Finally, to reduce the influence of the feature conversion process on the original image information, we introduce a Fidelity Guarantee Module(FGM) to gradually fuse the degraded image features with the pre-trained decoder features. This improves the consistency of the model before and after the feature conversion process.

Our main contributions are summarised as follows: I. DCCGN achieves significant CT reconstruction using only image domain data. II. DCCGN introduces artifact-free CT data prior information in a novel way in the metal artifact removal process. III. DCCGN proposes a novel feature residual module to ensure the fidelity of reconstruction.

2.

METHOD

The proposed framework for generating discrete clinical aggregations, DCCGN (Fig. 2), produces a high-fidelity reconstruction that incorporates prior clinical information. It consists of three main modules which are combined through a recursive training process: the Pre-trained Clinical Module(PCM), the Robust Transformation Module(RTM), and the Fidelity Guarantee Module(FGM).

Figure 2.

Discretized Clinical Convergence Generative Network (DCCGN), the modules of DCCGN are recursively fused together. The PCM is first trained with the help of clean clinical CT, which are frozen later to assist in the training of RTM with artifact-affected CT as inputs, and the pre-trained PCM and RTM are then involved in the FGM training process.

00165_PSISDG13270_1327005_page_3_1.jpg

2.1

Pre-trained Clinical Module(PCM)

PCM allows the reconstruction process to converge to a clinically clean CT. The training process of the PCM module involves only clinically clean CT and the parameters of the module are kept frozen in the subsequent stages. As shown in Fig.2(a), clinical clean CT data Ic ∈ ℝH×W×1 are first embedded by an encoder E to obtain their corresponding compression features xc ∈ ℝm×n×1. According to the VQGAN,12 each pixel point of xc ∈ ℝm×n×1 is nearest-neighbour computed with the feature item ck ∈ ℝd in the codebook C to obtain its code sequence S ∈ {0,1,…,n−1}m×n. Based on the sequence S, the features xc are re-extracted from the codebook to form a new quantized feature Xcb ∈ ℝm×n×d, i.e., 00165_PSISDG13270_1327005_page_2_2.jpg when Si×n+j = k. The decoder reconstructs this clean CT data using Xcb as input. We adopt three image-level loss between the input Ic and the reconstruction result Irec: L1 loss L1, perceptual loss Lper and adversarial loss Ladv. In order to better optimise the codebook, we also adopt the code-level loss Lcode:

00165_PSISDG13270_1327005_page_2_3.jpg

where sg is the stop-gradient operator like VQGAN,12 and β = 0.25 is a weight trade-off. The complete optimisation objective for this stage are as follows:

00165_PSISDG13270_1327005_page_2_4.jpg

where αl1 and αadv is set to 1.5 and 0.3, respectively. After adequate training, the frozen codebook and decoder comprise the PCM to participate in the reconstruction process.

2.2

Robust Transformation Module(RTM)

RTM makes intermediate feature conversion more robust. The nearest neighbour algorithm would lead to mismatching of features due to metal artifacts on CT image pixels. We refer to the idea of CodeFormer13 to replace the nearest-neighbor algorithm of the VQGAN with the RTM module to match features in a more robust way.

Based on the pre-trained VQGAN in Sec.2.1, as shown in Fig.2(b), we accessed a Transformer-encoder14 Etr containing nine self-attention blocks after the encoder E and added an extra linear layer for generating code sequences to form the RTM, as shown in Fig.3(b). Specifically, a CT image containing metal artifact is processed by encoder Em (finetuned from E) to generate the corresponding artifact-affected feature xm. The RTM takes xm as input and converts get the code sequence Sm(like S in Sec.2.1). The frozen PCM takes Sm as input to generate the quantized clean CT feature Xmcb and reconstructed clean CT image. We only use the code-level loss to train RTM: cross-entropy loss 00165_PSISDG13270_1327005_page_3_2.jpg and L2 loss 00165_PSISDG13270_1327005_page_3_3.jpg:

00165_PSISDG13270_1327005_page_3_4.jpg

Figure 3.

The structure and calculation process of FGM(a) and RTM(b).

00165_PSISDG13270_1327005_page_4_1.jpg

where the ground truth of code sequence S and feature Xcb is obtained from the pre-trained VQGAN in Sec.2.1. While training the RTM, E will be fine-tuned to get Em, the complete loss of this stage is as follows:

00165_PSISDG13270_1327005_page_3_5.jpg

where λ is set to 0.5.

2.3

Fidelity Guarantee Module(FGM)

FGM ensures the fidelity of the result by fusing the encoder stage features with the decoder features in a scaled manner, as shown in Fig.2(c). RTM replaces artifact-affected features directly by matching them with the codebook entries, and the whole process is discrete and a brand new feature map is obtained, so it is difficult to ensure the fidelity of the final result. Therefore, we make a new design according to the actual situation of metal artifact reduction, as shown in Fig.3(a). Specifically, we feed the intermediate stage features of the encoder to two convolution stacks f1, f2 with the same structure to obtain xf1, xf2. According to cGAN,15 we reshape the corresponding stage features of the decoder by α and β:

00165_PSISDG13270_1327005_page_4_2.jpg

We add the FGM at the stage where the encoder and decoder generate features of size s ∈ {32, 64, 128, 256}, respectively. To ensure that enough coded feature information is involved in the reconstruction process and no additional noise is introduced, we set ω to 0.5. To train the FGM and fine-tune the Em and RTM in Sec.2.2, we continue to use the metal-affected data as input and combine the loss LPCM and LRTM as the complete loss of this stage. The percentage of each loss function remains constant.

3.

EXPERIMENTS

3.1

Experimental configuration

Dataset and pre-processing.

We used the publicly available DeepLesion17 dataset to simulate metal artifacts in CT images following the simulation protocol from Yu et al.7 The 100 metal masks from Zhang et al.3 were divided into 90 and 10 for training and testing, respectively. We randomly select 130,000 CT slices from the DeepLesion dataset as clinical CT samples for the stage(a), and extract 1,500 CT slices from them to synthesise metal-affected samples with 90 metal masks for the stage(b), stage(c). For testing, we additionally randomly extract 200 CT slices and use the 10 test metal masks to synthesise the test dataset following the same process.

Implementation details and Evaluation metrics.

The input image size is fixed at 256 × 256 and the codebook size is set to 1200, each code item’s size is 256. For all modules of training, we use the Adam18 optimizer with a batch size of 10. We set the learning rate to 1 × 10−4 for PCM training, and set 1 × 10−4 and 5 × 10−5 for RTM and FGM, respectively. We set the iterations for PCM, RTM and FGM to 1.6M, 1M and 800K, respectively. The peak signal-to-noise ratio(PSNR) and the structured similarity index(SSIM) were adopted for the MAR evaluation.

3.2

Experimental results

To validate the effectiveness of DCCGN in the field of image-only MAR, we compared it with one conventional MAR methods16 and several representative state-of-the-art (SOTA) image-only methods3, 911 on synthetic and clinical datasets, respectively.

Comparison with SOTAs on synthetic datasets.

As illustrated in Tab.1, the DCCGN outperformed the state-of-the-art (SOTA) methods with a high PSNR and SSIM scores for image-domain-only MAR in synthetic datasets. Qualitative comparisons, as shown in Fig.4, indicate that the DCCGN is more effective than other methods in reproducing the actual appearance and structural details and in circumventing the problem of noise residuals.

Figure 4.

Qualitative results demonstrate that DCCGN’s results are closer to artifact-free images without secondary artifacts. Red area indicates the metal implant; border color indicates the correspondence between the magnified view and the overview view.

00165_PSISDG13270_1327005_page_5_1.jpg

Figure 5.

Comparison on clinical datasets.

00165_PSISDG13270_1327005_page_5_2.jpg

Table 1

Quantitative comparisons (PSNR/SSIM) demonstrate the superior performance of DCCGN compared to SOTA image-domain-only methods.

MethodDomain-usagePSNR↑SSIM↑
Linear16Conventional26.010.8741
CNNMAR10Image-only30.380.9644
cGAN3Image-only34.100.9341
DICDNet11Image-only37.860.9788
OSCNet9Image-only42.190.9931
DCCGNImage-only41.830.9986

Comparison with SOTAs on clinical datasets.

The publicly available CLINIC-metal19 was collected for clinical testing. Due to the lack of ground truth, only qualitative analyses were performed. As shown in 5, the proposed DCCGN was found to remove artifacts from the images more efficiently and recover details more accurately, which undoubtedly proves the clinical validity of the approach.

4.

CONCLUSION

In this study, we propose a novel Discrete Clinical Convergence Generation Network(DCCGN) that quantitatively incorporates clinical clean CT data into the reconstruction process via the pre-trained PCM module. The robustness and fidelity of the reconstruction are improved using RTM and FGM. The whole process uses image domain data with high quality, which overcomes the limitations of sinogram domain information. The experimental results demonstrate the effectiveness of our method in the MAR task.

ACKNOWLEDGMENTS

This work was supported by the National Natural Science Foundation of China under Grant 62272135, Grant 62372135 and Grant 62202092.

REFERENCES

[1] 

Liao, H., Lin, W.-A., Zhou, S. K., and Luo, J., “Adn: artifact disentanglement network for unsupervised metal artifact reduction,” IEEE Transactions on Medical Imaging, 39 (3), 634 –643 (2019). https://doi.org/10.1109/TMI.42 Google Scholar

[2] 

Wellenberg, R., Hakvoort, E., Slump, C., Boomsma, M., Maas, M., and Streekstra, G., “Metal artifact reduction techniques in musculoskeletal ct-imaging,” European journal of radiology, 107 60 –69 (2018). https://doi.org/10.1016/j.ejrad.2018.08.010 Google Scholar

[3] 

Zhang, Y. and Yu, H., “Convolutional neural network based metal artifact reduction in x-ray computed tomography,” IEEE transactions on medical imaging, 37 (6), 1370 –1381 (2018). https://doi.org/10.1109/TMI.2018.2823083 Google Scholar

[4] 

Ghani, M. U. and Karl, W. C., “Fast enhanced ct metal artifact reduction using data domain deep learning,” IEEE Transactions on Computational Imaging, 6 181 –193 (2019). https://doi.org/10.1109/TCI.2019.2937221 Google Scholar

[5] 

Lin, W.-A., Liao, H., Peng, C., Sun, X., Zhang, J., Luo, J., Chellappa, R., and Zhou, S. K., “Dudonet: Dual domain network for ct metal artifact reduction,” (2019). https://doi.org/10.1109/CVPR41558.2019 Google Scholar

[6] 

Lyu, Y., Lin, W.-A., Liao, H., Lu, J., and Zhou, S. K., “Encoding metal mask projection for metal artifact reduction in computed tomography,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23, 147 –157 (2020). Google Scholar

[7] 

Yu, L., Zhang, Z., Li, X., and Xing, L., “Deep sinogram completion with image prior for metal artifact reduction in ct images,” IEEE transactions on medical imaging, 40 (1), 228 –238 (2020). https://doi.org/10.1109/TMI.42 Google Scholar

[8] 

Wang, H., Li, Y., Zhang, H., Chen, J., Ma, K., Meng, D., and Zheng, Y., Indudonet: An interpretable dual domain network for ct metal artifact reduction, (2021). Google Scholar

[9] 

Wang, H., Xie, Q., Zeng, D., Ma, J., Meng, D., and Zheng, Y., “Oscnet: Orientation-shared convolutional network for ct metal artifact learning,” IEEE Transactions on Medical Imaging, (2023). Google Scholar

[10] 

Wang, J., Zhao, Y., Noble, J. H., and Dawant, B. M., “Conditional generative adversarial networks for metal artifact reduction in ct images of the ear,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I, 3 –11 (2018). Google Scholar

[11] 

Wang, H., Li, Y., He, N., Ma, K., Meng, D., and Zheng, Y., “Dicdnet: deep interpretable convolutional dictionary network for metal artifact reduction in ct images,” IEEE Transactions on Medical Imaging, 41 (4), 869 –880 (2021). https://doi.org/10.1109/TMI.2021.3127074 Google Scholar

[12] 

Esser, P., Rombach, R., and Ommer, B., in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12873 –12883 (2021). Google Scholar

[13] 

Zhou, S., Chan, K., Li, C., and Loy, C. C., “Towards robust blind face restoration with codebook lookup transformer,” Advances in Neural Information Processing Systems, 35 30599 –30611 (2022). Google Scholar

[14] 

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I., “Attention is all you need,” Advances in neural information processing systems, 30 (2017). Google Scholar

[15] 

Wang, X., Yu, K., Dong, C., and Loy, C. C., in Proceedings of the IEEE conference on computer vision and pattern recognition, 606 –615 (2018). Google Scholar

[16] 

Kalender, W. A., Hebel, R., and Ebersberger, J., “Reduction of ct artifacts caused by metallic implants,” Radiology, 164 (2), 576 –577 (1987). https://doi.org/10.1148/radiology.164.2.3602406 Google Scholar

[17] 

Yan, K., Wang, X., Lu, L., and Summers, R. M., “Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning,” Journal of medical imaging, 5 (3), 036501 –036501 (2018). https://doi.org/10.1117/1.JMI.5.3.036501 Google Scholar

[18] 

Kingma, D. P. and Ba, J., “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, (2014). Google Scholar

[19] 

Liu, P., Han, H., Du, Y., Zhu, H., Li, Y., Gu, F., Xiao, H., Li, J., Zhao, C., Xiao, L., et al, “Deep learning to segment pelvic bones: large-scale ct datasets and baseline models,” International Journal of Computer Assisted Radiology and Surgery, 16 749 –756 (2021). https://doi.org/10.1007/s11548-021-02363-8 Google Scholar
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Mingye Zou, Xinghua Ma, Wei Wang, and Kuanquan Wang "Conversion-based reconstruction: a discretized clinical convergence generative network for CT metal artifact reduction", Proc. SPIE 13270, International Conference on Future of Medicine and Biological Information Engineering (MBIE 2024), 1327005 (11 September 2024); https://doi.org/10.1117/12.3039971
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Metals

Computed tomography

Education and training

Image processing

Data analysis

Medical image reconstruction

Data modeling

Back to Top