Ai Nakao1, Daiki Tamada2, Tomohiro Takamura2, Shintaro Ichikawa2, Utaroh Motosugi2, and Yasuhiko Terada1
1Institute of Applied Physics, University of Tsukuba, Tsukuba, Japan, 2Department of Radiology, University of Yamanashi, Chuo, Japan
Synopsis
Shortening scan time has been a long-standing goal in MRI, and image
synthesis and superresolution using deep learning (DL) are promising tools for
achieving this goal. However, most of studies use datasets with healthy
volunteers for network training, and the clinical evaluation has not yet been
fully performed. Here we trained networks using a large, clinical dataset of
patients with brain lesions, and evaluated the generated images in terms of the
diagnostic image quality and performance. Our results showed that FLAIR superresolution
outperformed FLAIR image synthesis. Our results could also provide useful
guidelines for evaluating diagnostic performance of DL-based networks.
INTRODUCTION
Image
synthesis1-3 and superresolution4 using deep learning (DL) have potential
for shortening long scan time of MRI, and have attracted great attention. Most
of these studies focus on the evaluation and comparison of different networks
based on image quality indices used in computer vision. However, only a few
studies have been reported for clinical validation and translation. This is mainly
because most studies use a publicly-available dataset with healthy volunteers, which
is not suitable for testing images with a diverse range of disease appearance.
Here we obtained a large dataset of patients with brain tumors in clinical routine,
and performed the clinical validation of the DL-based image synthesis and superresolution.
We chose FLAIR as a main target, because FLAIR is essential for interpreting
neurological diseases and has long scan time. FLAIR/T1WI/T2WI were used as network
inputs, because they are widely used in clinical routine.METHOD
Networks
For
image synthesis, synthesized FLAIR images (FLAIR-SYN) were generated from
T1WI/T2WI (Fig. 1(a)). For superresolution, superresolved FLAIR images (FLAIR-SR)
were generated from multiple images of T2WI, and low-resolution FLAIR and T1WI
(FLAIR-LR and T1W-LR) (Fig. 1(b)). T1W-LR were used to match the total scan
times of input images between image synthesis and superresolution (Fig. 1(c)). In
this case, superresolved T1WI (T1W-SR) were also generated because
high-resolution T1WI are also required clinically.
Pix2pix5 was used for image synthesis, and pix2pix and SRGAN6 were
used for superresolution. In addition, dual-domain generative adversarial model7 was incorporated into pix2pix to assure the k-space consistency. For
training, the batch size was 12, and the number of epochs was 800.
Image acquisition
A GE 3 T (SIGMA Premier) scanner was used. Low-resolution images were
acquired with the minimum number of phase encodings that users could set (Fig.
1(c)). Totally, 3083 images of 122 patients were acquired. The 2466 and 617
images of 99 patients were used for learning and verification, and the 604
images of 23 patients were used for testing.
Evaluation test
The peak signal-to-noise (PSNR) and structural similarity (SSIM) were
calculated to evaluate the image quality. In addition, a radiologist evaluated
the generated and original images in terms of diagnostic image quality and
diagnostic performance:
(1) Diagnostic image quality (blind test): A Scheffe test for paired
comparisons between the FLAIR-SYN, FLAIR-SR, and original FLAIR and that
between the T1W-SR and original T1WI were performed according to (i) visualization of four anatomic
structures and (ii) appearance of image artifacts.
(2) Diagnostic performance (non-blind test): The FLAIR-SYN and FLAIR-SR
were evaluated against the original FLAIR in terms of (i) visualization of
white matter lesions, (ii) appearance of pseudo-lesions generated by DL, (iii) contrast
of cortico-medullary junction, and (iv) overall diagnostic quality. Likewise,
T1W-SR were also evaluated against the original T1WI.RESULTS
Image quality test
FLAIR-SR had the
higher mean PSNR/SSIM (35.91 dB/0.9715) than FLAIR-SYN (32.23 dB/0.9241). As
shown in Fig. 2(a), white lesions partly disappeared in FLAIR-SYN. Meanwhile,
they were reproduced in FLAIR-SR, though the contrast was slightly reduced. The
structure of choroid plexus was also reproduced in FLAIR-SR, while it appeared
abnormally in FLAIR-SYN. Figure 3 shows the results of diagnostic image quality
test for FLAIR. Overall, the average preferences of the visualization of
anatomic structures were larger for FLAIR-SR than for FLAIR-SYN, indicating the
higher image quality. The preference difference between FLAIR-SR and original
FLAIR was not so large, except for basal ganglia. Even in case of basal ganglia,
FLAIR-SR exhibited the much higher image quality than FLAIR-SYN.
For T1WI-SR,
the mean PSNR/SSIM were 37.21 dB/0.944. T1W-SR had almost the same diagnostic
image quality as the original T1WI (Fig. 4).
The image
artifacts were largely suppressed compared with the original images both for
FLAIR-SYN, FLAIR-SR, and T1W-SR.
Diagnostic
performance test
The contrast of cortico-medullary junction was almost the same for
FLAIR-SYN and FLAIR-SR (Table 1). Psuedo-lesions were merely appeared in all
cases. The visualization of white matter regions and the overall quality was
low for FLAIR-SYN but was improved for FLAIR-SR.DISCUSSION
Compared with FLAIR-SYN, FLAIR-SR had much improved
performance in terms of diagnostic image quality and diagnostic performance. FLAIR
image synthesis has been widely investigated, but our results indicate that superresolution
using multiple input images outperforms image synthesis. This is mainly because
lesions were not clearly imaged in T1WI and T2WI, and FLAIR-LR is necessary for
lesion visualization.
Although the diagnostic image quality of
FLAIR-SR and T1W-SR was almost comparable with the original images, the
contrast of the small lesions was reduced and the overall diagnostic performance
was still not high enough to replace them with the original images. This could
be overcome by gathering more data including different lesion appearances,
improving the networks, and using a more-sophisticated data acquisition
including sparse sampling. Our results reveal the importance of gathering a
large dataset including a variety of lesion appearance for network development
and clinical translation.CONCLUTION
We obtained the large, clinical dataset
including lesions, developed the state-of-the-art, DL-based image synthesis and
superresolution, and evaluated diagnostic image quality and diagnostic
performance. Our results could provide useful guidelines for evaluating
clinical diagnostic performance of synthesized images using DL-based networks.Acknowledgements
The authors thank Dr. Yoshie Omiya and Dr. Noriaki Nakata from the University of Yamanashi for their valuable comments.References
1. Gong E et al., Improved Synthetic MRI
form Multi-echo MRI Using Deep Learning, 2019, Montreal, Canada: ISMRM, 2795.
2. Abe T and Salamon N, A Deep Learning
Approach to Synthesize FLAIR Image from T1WI and T2WI, 2019, Montreal, Canada:
ISMRM, 3130.
3. Liu F and McMillan A, MR Image
Synthesis Using A Deep Learning Based Data-Driven Approach, 2019, Montreal,
Canada: ISMRM, 3490.
4. Zeng K et al, Simultaneous single- and
multi-contrast super-resolution for brain MRI images based on a convolutional
neural network. Computers in Biology and Med 2018;99:133-141.
5. Isola P et al. Image-to-Image
Translation with Conditional Adversarial Networks. arXiv:1611.07004v3. 2018
6. Ledig C et al. Photo-Realistic Single
Image Super-Resolution Using a Generative Adversarial Network.
arXiv:1609.04802v5. 2016.
7. Wang G et al. Dual-domain Generative
Adversarial Model for Accelerated MRI Reconstruction. 2019, Montreal, Canada:
ISMRM; 4655.