2238

Evaluation of generative models for synthetic CT images using SINGHA, a new spectrally informed metric

Veronica Ravano^1,2,3, Adham Elwakil^1,2,3, Thomas Yu^1,2,3, Tom Hilbert^1,2,3, Bénédicte Maréchal^1,2,3, Jonas Richiardi², Jean-Philippe Thiran³, Charbel Mourad², Paul Margain⁴, Julien Favre⁴, Tobias Kober^1,2,3, Patrick Omoumi², and Stefan Sommer^1,5
¹Advanced Clinical Imaging Technology, Siemens Healthineers International AG, Lausanne, Geneva and Zurich, Switzerland, ²Department of Radiology, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland, ³LTS5, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, ⁴Swiss Biomotion Lab, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland, ⁵Swiss Centre for Musculoskeletal Imaging (SCMI), Balgrist Campus, Zurich, Switzerland

Synopsis

Keywords: Analysis/Processing, MSK, synthetic CT

Motivation: Synthetic CT (sCT) based on MRI could improve the characterization of bone pathology by estimating bone mineral density and providing a high level of structural details. However, evaluating the performance of sCT is challenging in both respects.

Goal(s): To propose an evaluation framework for sCT that quantifies accuracy both in terms of image intensity and depiction of structural details.

Approach: We propose the new frequency-based metric SINGHA that captures the sharpness difference between images.

Results: SINGHA was complementary to standard metrics and captured differences in high frequency content, thereby contributing to a more comprehensive evaluation of sCT images.

Impact: Using the newly introduced Spectrally-INformed Grading of High-frequency Attributes (SINGHA) in conjunction with standard intensity-based metrics enables to simultaneously evaluate synthetic CT accuracy in terms of bone mineral density and sharpness.

Introduction

The generation of synthetic CT (sCT) contrast from MR images has been proposed as a radiation-free, single-examination solution for imaging of bones. The clinical relevance of sCT lies in its potential to improve characterization of bone pathology by estimating bone mineral density (e.g., for attenuation maps in radiotherapy planning^1,2), and depicting structural details such as the trabecular structure. To this end, a variety of generative deep learning models have been proposed³, but an exhaustive quantitative comparison of their performance is challenging. In fact, standard evaluation metrics typically reflect the performance in terms of intensity (i.e., bone mineral density), but fail to capture the level of structural details.

In this work, we propose an evaluation framework for sCT, including a novel frequency-based measure that compares image sharpness of sCT to the ground truth, and evaluate the performance of a 2.5D and a 3D U-Net model.

Methods

Study population, imaging protocols and pre-processing
A cohort of 418 patients from the Lausanne Knee Study (47.2±16.2 years old, 231 females) received an MRI knee examination at 3T that included a T1-weighted 3D gradient-echo sequence (0.5 mm isotropic, TR=700ms, TE=11ms, MAGNETOM Prisma^fit, Siemens Healthineers AG, Erlangen, Germany), and a CT scan (0.3 mm isotropic, Revolution, GE Healthcare, USA).
The use of voxel-to-voxel-based models requires an accurate correspondence between MRI and CT image pairs for each subject. Therefore, a combination of subsequent rigid, affine and non-linear transformations was applied in elastix⁴ to register the T1-weighted images to CT volumes (down-sampled to 0.5 mm resolution). Then, registration quality was visually assessed by two independent readers to only retain image pairs with the highest score.

Synthetic CT generation
Three consecutive MRI slices in all three orthogonal planes were used as input to train a 2.5D U-Net and 3D patches were fed into the 3D U-Net. Both models were trained to minimize a voxel-wise L1 loss, using 90% of the dataset for training, and 10% for testing (relevant training parameters in Table 1).
At inference, sCT images were generated using a sliding window with gaussian averaging over the predicted patches. For the 2.5D model, the resulting sCT was computed as a voxel-wise average between slices predicted from all three orientations.

Evaluation
We propose a new Spectrally-INformed Grading of High-frequency Attributes (SINGHA) metric, based on the high frequency content of an image $$$X$$$. To this end, the magnitude ofthe FFT of image $$$X$$$ ($$$FT_X$$$) was averaged across all frequencies $$$k$$$ at a given Euclidean distance $$$d$$$ from the k-space centre:
$$
𝑆𝑃𝑋 (𝑑) = \sum_{\forall k}\frac{|FT_X[‖𝑘‖==𝑑]| }{n},
$$
where $$$n$$$ represents the number of frequency elements at distance $$$d$$$.Then, SINGHA metric was defined as:
$$
𝑆𝐼𝑁𝐺𝐻𝐴(𝐶𝑇, 𝑠𝐶𝑇) = \sum_{\forall d}|𝑆𝑃_{𝐶𝑇} − 𝑆𝑃_{𝑠𝐶𝑇}| ∗ 𝑑,
$$
resulting in an amplification of higher frequency contribution.

Additionally, the accuracy of sCT images was evaluated using standard intensity-based metric (i.e., mean average error, MAE; root mean squared error, RMSE; structural-similarity measure, SSIM, and peak signal-to-noise ratio, PSNR).

Results

After visual inspection of the registration quality, 101 paired datasets were selected for training/testing. Figure 1 shows the acquired T1-weighted MRI, the ground truth CT and the two sCT generated by the 2.5D and the 3D U-Net models. Qualitatively, the 3D model resulted in blurrier images compared to the 2.5D model, which depicted the trabecular structure, the bone physis, and focal regions of higher bone mineral density more clearly (see Figure 2).

Table 2 reports the numerical results for both models in terms of accuracy measures, and Figure 3 shows their distribution across models with the results of statistical comparison using Wilcoxon signed rank test. When considering the standard intensity-based metrics, the 3D U-Net was always associated with significantly higher performance. On the other hand, the 2.5D model resulted into significantly lower SINGHA values. Despite a relatively low effect size, this reflected a higher similarity between ground truth and sCT images in terms of high frequency content and image sharpness.

Discussion and Conclusion

While intensity-based evaluation metrics highlighted a better performance of the 3D model, the newly proposed SINGHA metric confirmed the qualitative perception of an improved depiction of structural details generated by the 2.5D model. SINGHA is thus complementary to intensity-based evaluation metrics. Therefore, our results suggest that frequency-based metrics such as SINGHA should be considered in conjunction with standard metrics for the evaluation of sCT accuracy, both in the depiction of bone structure and in the quantitative estimation of bone mineral density.

Acknowledgements

Lausanne Knee Study: Ethics aproved and subjects gave their consent. This work was performed with the support of the Swiss National Science Foundation,Switzerland (Sinergia grant CRSII5_177155)

References

1. Hsu SH, Han Z, Leeman JE, Hu YH, Mak RH, Sudhyadhom A. Synthetic CT generation for MRI-guided adaptive radiotherapy in prostate cancer. Front Oncol. 2022;12. doi:10.3389/fonc.2022.969463

2. Palmér E, Karlsson A, Nordström F, et al. Synthetic computed tomography data allows for accurate absorbed dose calculations in a magnetic resonance imaging only workflow for head and neck radiotherapy. Phys Imaging Radiat Oncol. 2021;17:36-42. doi:10.1016/j.phro.2020.12.007

3. Spadea MF, Maspero M, Zaffino P, Seco J. Deep learning based synthetic-CT generation in radiotherapy and PET: A review. Med Phys. 2021;48(11):6537-6566.doi:10.1002/mp.15150

4. Klein S, Staring M, Murphy K, Viergever MA, Pluim JPW. elastix: a toolbox fori ntensity-based medical image registration. IEEE Trans Med Imaging.2010;29(1):196-205. doi:10.1109/TMI.2009.2035616

Figures

Table 1. Network training parameters for the 2.5D and 3D U-Net models.

Figure 1. Three orthogonal planes of the generated synthetic CT (sCT) using 2.5D and 3D U-Net models for an exemplary subject. The synthetic models nicely represent the general anatomy of the bone structures; however, details in the trabecular bones are less prominent, especially in the 3D U-Net.

Figure 2. Results of synthetic CT (sCT) generation using 2.5D and 3D U-Net models for one example subject. The synthetic models accurately depict a localized area of higher bone mineral density

Figure 3. Comparison of accuracy metrics between sCT images obtained from the 2.5D and the 3D U-Net model. Abbreviations: MAE: mean average error, RMSE: root mean squared error, PSNR: peak signal to noise ratio, SSIM: structural similarity measure, SINGHA: Spectrally-INformed Grading of High-frequency Attributes, ↑: higher values result in higher accuracy, ↓: lower values result in higher accuracy.

Table 2. Accuracy of sCT obtained using a 2.5D and a 3D U-Net, reported as mean±standard deviation across testing subjects. Abbreviations: MAE: mean average error, RMSE: root mean squared error, PSNR: peak signal to noise ratio, SSIM: structural similarity measure, SINGHA: Spectrally-INformed Grading of High-frequency Attributes, ↑: higher values result in higher accuracy, ↓: lower values result in higher accuracy.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

2238

DOI: https://doi.org/10.58530/2024/2238