1513

Synthetic T2-weighted fat sat delivers valuable information on spine pathologies: multicenter validation of a Generative Adversarial Network

Sarah Schlaeger¹, Katharina Drummer¹, Malek El Husseini¹, Florian Kofler^1,2,3, Nico Sollmann^1,4,5, Severin Schramm¹, Claus Zimmer¹, Dimitrios C. Karampinos⁶, Benedikt Wiestler¹, and Jan S. Kirschke¹
¹Department of Diagnostic and Interventional Neuroradiology, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany, ²Department of Informatics, Technical University of Munich, Munich, Germany, ³TranslaTUM - Central Insitute for Translational Cancer Research, Technical University of Munich, Munich, Germany, ⁴TUM-NeuroImaging Center, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany, ⁵Department of Diagnostic and Interventional Radiology, University Hospital Ulm, Ulm, Germany, ⁶Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany

Synopsis

Generative Adversarial Networks (GANs) can synthesize missing Magnetic Resonance (MR) contrasts from existing MR data. In spine imaging, sagittal T2-w fat sat (fs) sequences are an important additional MR contrast next to conventional T1-w and T2-w sequences. In this study, the diagnostic performance of a GAN-based, synthetic T2-w fs is evaluated in a multicenter dataset. By comparing the synthetic T2-w fs with its true counterpart regarding ability to detect spinal pathologies not seen on T1-w and non-fs T2-w, diagnostics agreement, and image and fs quality our work shows that a synthetic T2-w fs delivers valuable information on spine pathologies.

Purpose

Magnetic resonance imaging (MRI) plays an outstanding role in the evaluation of spine pathologies [1,2]. Next to routinely acquired sagittal T1-w and T2-w sequences, T2-w sequences combined with fat suppression/separation techniques have become a major part of spine MRI examinations [2-4]. However, acquisition of an additional T2-w fat sat (fs) sequence leads to longer scan protocols and might be prone to artefacts. Recently, Generative Adversarial Networks (GANs) based on a deep learning architecture offer a promising approach for generation of missing MR contrasts [5-7]. GANs can be used to accelerate scan time and augment existing data [8-10]. To foster their clinical implementation GAN-generated data have to provide relevant information, need to pass a validation by radiologists’ perception and the framework has to prove generalizability. Therefore, this work aims to investigate the diagnostic performance of a sagittal, GAN-synthesized T2-w fs of the spine compared to a true T2-w fs with regard to (1) the ability to detect pathologies not seen on T1-w and non-fs T2-w, (2) diagnostic agreement, and (3) image and fs quality using a multicenter testing dataset.

Methods

Synthesis of sagittal T2-w fs images - training and testing data:
174 patients with sagittal T1-w TSE, T2-w TSE and fs T2-w TSE images of the spine were retrospectively identified. A GAN based on the pix2pix architecture by Isola et al. [11] was trained to generated T2-w fs images from T1- and T2-w images of 73 patients from two in-house 3 T scanners (Ingenia/Achieva dstream, Philips Healthcare, Best, The Netherlands). Subsequently, the GAN framework was used to create synthetic T2-w fs images from previously unseen 101 patients (Figure 1). The corresponding scans originated from 38 scanners from three vendors (Philips Healthcare, Best, The Netherlands; Siemens Healthineers, Erlangen, Germany; GE Healthcare, Chicago, Illinois, USA). 41 datasets were acquired at 1.5 T, 60 datasets at 3 T with a large range of different sequence parameters.
Evaluation of GAN performance:
In the testing data diagnostic performance of synthetic T2-w fs images was assessed by two expert readers (five and two years of experience). Six pathologies were graded in five beforehand defined consecutive vertebral segments: bone marrow abnormalities, spondylodiscitis expansion, Modic changes, vertebral fractures, spinal cord lesions, and paravertebral tissue abnormalities. Pathologies were first assessed on T1-w and T2-w images only, then a T2-w fs was blindly added, randomized synthetic or true. The approach was repeated for the remaining synthetic or true T2-w fs. After that, image and fs quality of synthetic or true T2-w fs was graded. Subsequently, a ground truth (GT) grading was defined in a consensus reading of both readers incorporating additional scans, imaging modalities and clinical information. Evaluation of (1) the additional diagnostic information of the synthetic T2-w fs, (2) the agreement of synthetic (T1-w, T2-w and synthetic T2-w fs) and original protocol (T1-w, T2-w and true T2-w fs) (Cohen’s ĸ [12]), and (3) of image and fs quality was performed. Additionally, a visual Turing test presenting randomized 25 true and 25 synthetic T2-w fs images to eleven neuroradiologists was performed using a website-based GUI (Figure 2) [13].

Results

Additional diagnostic value of T2-w fs:
The agreement of pathology grading based on the synthetic protocol or the original protocol versus GT revealed significantly higher ĸ coefficients than the agreement of pathology grading based on T1-w and non-fs T2-w images only versus GT (synthetic: p = 0.043; original: p = 0.046) (Table 1; Figure 1).
Diagnostic agreement:
For both readers, the intermethod agreement between synthetic and original protocol ranged from substantial to almost perfect agreement for all evaluated pathologies, except for grading of spinal cord lesions by reader 1, showing a moderate agreement. The agreement between synthetic and original protocol per reader was higher than interrater agreement for assessment of the synthetic and original protocol except for spinal cord lesions (Table 2; Figure 3).
Image and fs quality of synthetic versus true T2-w fs:
The image quality of synthetic T2-w fs images was graded higher than that of the true T2-w fs images (97.0 % of synthetic versus 87.6 % of true T2-w fs images graded at least acceptable; p = 0.0023). Quality of fs grading was not significantly different between synthetic and true T2-w fs images (p > 0.05).In the Turing test no significant difference real condition versus expert grading was observed between true and synthetic T2-w fs image (p > 0.05).

Discussion & Conclusion

In our study, the synthetic T2-w fs provided diagnostic information superior to assessment based on T1-w and non-fs T2-w images only. The synthetic T2-w fs showed an excellent intermethod agreement and an overall better image quality compared to the true T2-w fs. Thereby, we could demonstrate the generalizability of our approach by training the network with images from two scanners only and validating it on unseen images from 38 scanners with various different acquisition parameters. As a limitation, the significantly higher graded image quality of synthetic compared to true T2-w fs images might lead to a learning bias of the readers. Therefore, the visual Turing test was performed, proving that synthetic and true T2-w fs could not reliably be distinguished. In conclusion, our synthetic T2-w fs shows potential for spine MRI examinations.

Acknowledgements

The present work was supported through an SFB grant, an DFG grant and a faculty internal grant.

References

[1] Winegar BA, et al., Pol J Radiol. 85:e550-e574, 2020

[2] ACR–ASNR–SCBT-MR–SSR PRACTICE PARAMETER FOR THE PERFORMANCE OF MAGNETIC RESONANCE IMAGING (MRI) OF THE ADULT SPINE.

[3] Sollmann N, et al. European journal of radiology. 131:109204, 2020

[4] Mascalchi M, et al. Magn Reson Imaging. 11(1):17-25, 1993

[5] Nie D, et al. IEE Trans Biomed Eng. 65(12):2720-2730, 2018

[6] Lv J, et al. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 379(2200):20200203, 2021

[7] Lee D, et al. Nature Machine Intelligence. 2(1):34-42, 2020

[8] Finck, et al. Investigative radiology. 55(5):318-323, 2020

[9] Kazuhiro K, et al. Tomography. 4(4):159-163, 2018

[10] Conte GM, et al. Radiology. 299(2):313-323, 2021

[11] Isola P, et al. 2017 IEEE Conference on Computer Vision and Pattern Recognition. p. 5967-5976, 2017

[12] Jakobsson U. Scand J Caring Sci. 19(4):427-431, 2005

[13] Kofler F, et al. arXiv preprint. arXiv:210306205, 2021

Figures

Figure 1: The GAN generates synthetic T2-w images from T1- w and non-fs T2-w images. A more accurate assessment of inflammatory Modic changes on synthetic and true T2-w fs images compared to on T1-w and non-fs T2-w images only is shown.

Figure 2: Representative screenshots of the website-based GUI for the Turing test showing (a) a true T2-w fs image and (b) a synthetic T2-w fs image. 25 true and 25 synthetic T2-w fs images were shown consecutively to eleven neuroradiologists, who were asked to classify the image as real or synthetic without learning about the real condition.

Figure 3: Representative synthetic and true T2-w fs images for (a) bone marrow abnormalities, (b) juxtadiscal Modic changes and (c) paravertebral tissue abnormalities.

Table 1: Intermethod agreement (Cohen’s Kappa coefficient; confidence interval (CI) of 95 %) between grading based on T1-w/non-fs T2-w images only, on the synthetic protocol (T1-w, T2-w, and synthetic T2-w fs), and on the original protocol (T1-w, T2-w, and true T2-w fs) versus groundtruth (GT) grading, respectively.

Table 2: Intermethod agreement (Cohen’s Kappa coefficient; confidence interval (CI) of 95 %) between synthetic protocol (T1-w, T2-w, and synthetic T2-w fs) and original protocol (T1-w, T2-w, and true T2-w fs) for reader 1 and 2; interrater agreement (Cohen’s Kappa coefficient; CI) for synthetic protocol and original protocol.

Proc. Intl. Soc. Mag. Reson. Med. 30 (2022)

1513

DOI: https://doi.org/10.58530/2022/1513