1883

Initial clinical evaluation of deep-learning-based image synthesis and superresolution using a clinical dataset of patients with brain lesions

Ai Nakao¹, Daiki Tamada², Tomohiro Takamura², Shintaro Ichikawa², Utaroh Motosugi², and Yasuhiko Terada¹
¹Institute of Applied Physics, University of Tsukuba, Tsukuba, Japan, ²Department of Radiology, University of Yamanashi, Chuo, Japan

Synopsis

Shortening scan time has been a long-standing goal in MRI, and image synthesis and superresolution using deep learning (DL) are promising tools for achieving this goal. However, most of studies use datasets with healthy volunteers for network training, and the clinical evaluation has not yet been fully performed. Here we trained networks using a large, clinical dataset of patients with brain lesions, and evaluated the generated images in terms of the diagnostic image quality and performance. Our results showed that FLAIR superresolution outperformed FLAIR image synthesis. Our results could also provide useful guidelines for evaluating diagnostic performance of DL-based networks.

INTRODUCTION

Image synthesis^1-3 and superresolution⁴ using deep learning (DL) have potential for shortening long scan time of MRI, and have attracted great attention. Most of these studies focus on the evaluation and comparison of different networks based on image quality indices used in computer vision. However, only a few studies have been reported for clinical validation and translation. This is mainly because most studies use a publicly-available dataset with healthy volunteers, which is not suitable for testing images with a diverse range of disease appearance. Here we obtained a large dataset of patients with brain tumors in clinical routine, and performed the clinical validation of the DL-based image synthesis and superresolution. We chose FLAIR as a main target, because FLAIR is essential for interpreting neurological diseases and has long scan time. FLAIR/T1WI/T2WI were used as network inputs, because they are widely used in clinical routine.

METHOD

Networks
For image synthesis, synthesized FLAIR images (FLAIR-SYN) were generated from T1WI/T2WI (Fig. 1(a)). For superresolution, superresolved FLAIR images (FLAIR-SR) were generated from multiple images of T2WI, and low-resolution FLAIR and T1WI (FLAIR-LR and T1W-LR) (Fig. 1(b)). T1W-LR were used to match the total scan times of input images between image synthesis and superresolution (Fig. 1(c)). In this case, superresolved T1WI (T1W-SR) were also generated because high-resolution T1WI are also required clinically. Pix2pix⁵ was used for image synthesis, and pix2pix and SRGAN⁶ were used for superresolution. In addition, dual-domain generative adversarial model⁷ was incorporated into pix2pix to assure the k-space consistency. For training, the batch size was 12, and the number of epochs was 800.
Image acquisition
A GE 3 T (SIGMA Premier) scanner was used. Low-resolution images were acquired with the minimum number of phase encodings that users could set (Fig. 1(c)). Totally, 3083 images of 122 patients were acquired. The 2466 and 617 images of 99 patients were used for learning and verification, and the 604 images of 23 patients were used for testing.
Evaluation test
The peak signal-to-noise (PSNR) and structural similarity (SSIM) were calculated to evaluate the image quality. In addition, a radiologist evaluated the generated and original images in terms of diagnostic image quality and diagnostic performance: (1) Diagnostic image quality (blind test): A Scheffe test for paired comparisons between the FLAIR-SYN, FLAIR-SR, and original FLAIR and that between the T1W-SR and original T1WI were performed according to (i) visualization of four anatomic structures and (ii) appearance of image artifacts. (2) Diagnostic performance (non-blind test): The FLAIR-SYN and FLAIR-SR were evaluated against the original FLAIR in terms of (i) visualization of white matter lesions, (ii) appearance of pseudo-lesions generated by DL, (iii) contrast of cortico-medullary junction, and (iv) overall diagnostic quality. Likewise, T1W-SR were also evaluated against the original T1WI.

RESULTS

Image quality test
FLAIR-SR had the higher mean PSNR/SSIM (35.91 dB/0.9715) than FLAIR-SYN (32.23 dB/0.9241). As shown in Fig. 2(a), white lesions partly disappeared in FLAIR-SYN. Meanwhile, they were reproduced in FLAIR-SR, though the contrast was slightly reduced. The structure of choroid plexus was also reproduced in FLAIR-SR, while it appeared abnormally in FLAIR-SYN. Figure 3 shows the results of diagnostic image quality test for FLAIR. Overall, the average preferences of the visualization of anatomic structures were larger for FLAIR-SR than for FLAIR-SYN, indicating the higher image quality. The preference difference between FLAIR-SR and original FLAIR was not so large, except for basal ganglia. Even in case of basal ganglia, FLAIR-SR exhibited the much higher image quality than FLAIR-SYN.
For T1WI-SR, the mean PSNR/SSIM were 37.21 dB/0.944. T1W-SR had almost the same diagnostic image quality as the original T1WI (Fig. 4).
The image artifacts were largely suppressed compared with the original images both for FLAIR-SYN, FLAIR-SR, and T1W-SR.
Diagnostic performance test
The contrast of cortico-medullary junction was almost the same for FLAIR-SYN and FLAIR-SR (Table 1). Psuedo-lesions were merely appeared in all cases. The visualization of white matter regions and the overall quality was low for FLAIR-SYN but was improved for FLAIR-SR.

DISCUSSION

Compared with FLAIR-SYN, FLAIR-SR had much improved performance in terms of diagnostic image quality and diagnostic performance. FLAIR image synthesis has been widely investigated, but our results indicate that superresolution using multiple input images outperforms image synthesis. This is mainly because lesions were not clearly imaged in T1WI and T2WI, and FLAIR-LR is necessary for lesion visualization.
Although the diagnostic image quality of FLAIR-SR and T1W-SR was almost comparable with the original images, the contrast of the small lesions was reduced and the overall diagnostic performance was still not high enough to replace them with the original images. This could be overcome by gathering more data including different lesion appearances, improving the networks, and using a more-sophisticated data acquisition including sparse sampling. Our results reveal the importance of gathering a large dataset including a variety of lesion appearance for network development and clinical translation.

CONCLUTION

We obtained the large, clinical dataset including lesions, developed the state-of-the-art, DL-based image synthesis and superresolution, and evaluated diagnostic image quality and diagnostic performance. Our results could provide useful guidelines for evaluating clinical diagnostic performance of synthesized images using DL-based networks.

Acknowledgements

The authors thank Dr. Yoshie Omiya and Dr. Noriaki Nakata from the University of Yamanashi for their valuable comments.

References

1. Gong E et al., Improved Synthetic MRI form Multi-echo MRI Using Deep Learning, 2019, Montreal, Canada: ISMRM, 2795.

2. Abe T and Salamon N, A Deep Learning Approach to Synthesize FLAIR Image from T1WI and T2WI, 2019, Montreal, Canada: ISMRM, 3130.

3. Liu F and McMillan A, MR Image Synthesis Using A Deep Learning Based Data-Driven Approach, 2019, Montreal, Canada: ISMRM, 3490.

4. Zeng K et al, Simultaneous single- and multi-contrast super-resolution for brain MRI images based on a convolutional neural network. Computers in Biology and Med 2018;99:133-141.

5. Isola P et al. Image-to-Image Translation with Conditional Adversarial Networks. arXiv:1611.07004v3. 2018

6. Ledig C et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. arXiv:1609.04802v5. 2016.

7. Wang G et al. Dual-domain Generative Adversarial Model for Accelerated MRI Reconstruction. 2019, Montreal, Canada: ISMRM; 4655.

Figures

Fig. 1 (a, b) Deep neural networks, and input and output images for (a) image synthesis and (b) superresolution. (c) Acquisition parameters and measurement times.

Fig. 2 Typical images of DL-based image synthesis and superresolution for (a) FLAIR and (b) T1WI.

Fig. 3 ANOVA analysis of diagnostic image quality test for FLAIR.

Fig. 4 ANOVA analysis of diagnostic image quality test for T1W.

Table 1 Diagnostic performance for FLAIR-SYN, FLAIR-SR, and T1W-SR.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)

1883