Veronica Ravano1,2,3, Adham Elwakil1,2,3, Thomas Yu1,2,3, Tom Hilbert1,2,3, Bénédicte Maréchal1,2,3, Jonas Richiardi2, Jean-Philippe Thiran3, Charbel Mourad2, Paul Margain4, Julien Favre4, Tobias Kober1,2,3, Patrick Omoumi2, and Stefan Sommer1,5
1Advanced Clinical Imaging Technology, Siemens Healthineers International AG, Lausanne, Geneva and Zurich, Switzerland, 2Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland, 3LTS5, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, 4Swiss Biomotion Lab, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland, 5Swiss Centre for Musculoskeletal Imaging (SCMI), Balgrist Campus, Zurich, Switzerland
Synopsis
Keywords: Analysis/Processing, MSK, synthetic CT
Motivation: Synthetic CT (sCT) based on MRI could improve the characterization of bone pathology by estimating bone mineral density and providing a high level of structural details. However, evaluating the performance of sCT is challenging in both respects.
Goal(s): To propose an evaluation framework for sCT that quantifies accuracy both in terms of image intensity and depiction of structural details.
Approach: We propose the new frequency-based metric SINGHA that captures the sharpness difference between images.
Results: SINGHA was complementary to standard metrics and captured differences in high frequency content, thereby contributing to a more comprehensive evaluation of sCT images.
Impact: Using the newly introduced Spectrally-INformed Grading of High-frequency Attributes (SINGHA) in conjunction with standard intensity-based metrics enables to simultaneously evaluate synthetic CT accuracy in terms of bone mineral density and sharpness.
Introduction
The generation of synthetic CT (sCT) contrast from MR images has been proposed as a radiation-free, single-examination solution for imaging of bones. The clinical relevance of sCT lies in its potential to improve characterization of bone pathology by estimating bone mineral density (e.g., for attenuation maps in radiotherapy planning1,2), and depicting structural details such as the trabecular structure. To this end, a variety of generative deep learning models have been proposed3, but an exhaustive quantitative comparison of their performance is challenging. In fact, standard evaluation metrics typically reflect the performance in terms of intensity (i.e., bone mineral density), but fail to capture the level of structural details.
In this work, we propose an evaluation framework for sCT, including a novel frequency-based measure that compares image sharpness of sCT to the ground truth, and evaluate the performance of a 2.5D and a 3D U-Net model.Methods
Study population, imaging protocols and pre-processing
A cohort of 418 patients from the Lausanne Knee Study (47.2±16.2 years old, 231 females) received an MRI knee examination at 3T that included a T1-weighted 3D gradient-echo sequence (0.5 mm isotropic, TR=700ms, TE=11ms, MAGNETOM Prismafit, Siemens Healthineers AG, Erlangen, Germany), and a CT scan (0.3 mm isotropic, Revolution, GE Healthcare, USA).
The use of voxel-to-voxel-based models requires an accurate correspondence between MRI and CT image pairs for each subject. Therefore, a combination of subsequent rigid, affine and non-linear transformations was applied in elastix4 to register the T1-weighted images to CT volumes (down-sampled to 0.5 mm resolution). Then, registration quality was visually assessed by two independent readers to only retain image pairs with the highest score.
Synthetic CT generation
Three consecutive MRI slices in all three orthogonal planes were used as input to train a 2.5D U-Net and 3D patches were fed into the 3D U-Net. Both models were trained to minimize a voxel-wise L1 loss, using 90% of the dataset for training, and 10% for testing (relevant training parameters in Table 1).
At inference, sCT images were generated using a sliding window with gaussian averaging over the predicted patches. For the 2.5D model, the resulting sCT was computed as a voxel-wise average between slices predicted from all three orientations.
Evaluation
We propose a new Spectrally-INformed Grading of High-frequency Attributes (SINGHA) metric, based on the high frequency content of an image $$$X$$$. To this end, the magnitude ofthe FFT of image $$$X$$$ ($$$FT_X$$$) was averaged across all frequencies $$$k$$$ at a given Euclidean distance $$$d$$$ from the k-space centre:
$$
𝑆𝑃𝑋 (𝑑) = \sum_{\forall k}\frac{|FT_X[‖𝑘‖==𝑑]| }{n},
$$
where $$$n$$$ represents the number of frequency elements at distance $$$d$$$.Then, SINGHA metric was defined as:
$$
𝑆𝐼𝑁𝐺𝐻𝐴(𝐶𝑇, 𝑠𝐶𝑇) = \sum_{\forall d}|𝑆𝑃_{𝐶𝑇} − 𝑆𝑃_{𝑠𝐶𝑇}| ∗ 𝑑,
$$
resulting in an amplification of higher frequency contribution.
Additionally, the accuracy of sCT images was evaluated using standard intensity-based metric (i.e., mean average error, MAE; root mean squared error, RMSE; structural-similarity measure, SSIM, and peak signal-to-noise ratio, PSNR).Results
After visual inspection of the registration quality, 101 paired datasets were selected for training/testing. Figure 1 shows the acquired T1-weighted MRI, the ground truth CT and the two sCT generated by the 2.5D and the 3D U-Net models. Qualitatively, the 3D model resulted in blurrier images compared to the 2.5D model, which depicted the trabecular structure, the bone physis, and focal regions of higher bone mineral density more clearly (see Figure 2).
Table 2 reports the numerical results for both models in terms of accuracy measures, and Figure 3 shows their distribution across models with the results of statistical comparison using Wilcoxon signed rank test. When considering the standard intensity-based metrics, the 3D U-Net was always associated with significantly higher performance. On the other hand, the 2.5D model resulted into significantly lower SINGHA values. Despite a relatively low effect size, this reflected a higher similarity between ground truth and sCT images in terms of high frequency content and image sharpness.Discussion and Conclusion
While intensity-based evaluation metrics highlighted a better performance of the 3D model, the newly proposed SINGHA metric confirmed the qualitative perception of an improved depiction of structural details generated by the 2.5D model. SINGHA is thus complementary to intensity-based evaluation metrics. Therefore, our results suggest that frequency-based metrics such as SINGHA should be considered in conjunction with standard metrics for the evaluation of sCT accuracy, both in the depiction of bone structure and in the quantitative estimation of bone mineral density.Acknowledgements
Lausanne Knee Study: Ethics aproved and subjects gave their consent. This work was performed with the support of the Swiss National Science Foundation,Switzerland (Sinergia grant CRSII5_177155)References
1. Hsu SH, Han Z, Leeman JE, Hu YH, Mak RH, Sudhyadhom A. Synthetic CT generation for MRI-guided adaptive radiotherapy in prostate cancer. Front Oncol. 2022;12. doi:10.3389/fonc.2022.969463
2. Palmér E, Karlsson A, Nordström F, et al. Synthetic computed tomography data allows for accurate absorbed dose calculations in a magnetic resonance imaging only workflow for head and neck radiotherapy. Phys Imaging Radiat Oncol. 2021;17:36-42. doi:10.1016/j.phro.2020.12.007
3. Spadea MF, Maspero M, Zaffino P, Seco J. Deep learning based synthetic-CT generation in radiotherapy and PET: A review. Med Phys. 2021;48(11):6537-6566.doi:10.1002/mp.15150
4. Klein S, Staring M, Murphy K, Viergever MA, Pluim JPW. elastix: a toolbox fori ntensity-based medical image registration. IEEE Trans Med Imaging.2010;29(1):196-205. doi:10.1109/TMI.2009.2035616