2900

Denoising single and multi-delay 3D pCASL using SWIN Transformer

Qinyang Shou¹, Chenyang Zhao¹, Xingfeng Shao¹, and Danny JJ Wang¹
¹Laboratory of Functional MRI Technology (LOFT), Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, United States

Synopsis

Keywords: Machine Learning/Artificial Intelligence, Machine Learning/Artificial Intelligence

We developed a Transformer-based deep learning denoising model to improve the SNR for both single and multi-delay perfusion images acquired using 3D pseudo-continuous arterial spin labeling (pCASL). This method can significantly improve SNR (~2-fold) of the perfusion images without introducing bias for CBF and ATT quantification for both single-delay and multi-delay 3D pCASL. Further training and testing of this model on clinical datasets acquired on different vendor platforms is warranted.

Introduction

Pseudo Continuous Arterial Spin Labeling (pCASL) is a promising imaging technique for quantification of cerebral blow flow (CBF) but suffers from low signal-to-noise ratio (SNR). Single delay pCASL has been widely applied with adequate repetitions to achieve sufficient SNR. Multi-delay ASL was proposed to achieve more accurate quantification of both CBF and arterial transit time (ATT)¹. However, acquiring multiple-delay images will sacrifice the number of averages resulting in insufficient SNR. Previous studies have shown the feasibility of using deep learning (DL) to improve SNR for ASL images². In this study, we developed a generalized DL denoising scheme for both single and multi-delay 3D pCASL to improve the image quality of CBF and ATT maps without compromising the quantification accuracy using a Transformer based neural network.

Methods

Data and preprocessing

For training data, single delay pCASL images were acquired on a 3T Siemens Prisma scanner with the following parameters: 3D GRASE readout with spatial resolution of 2.5×2.5×2.5 mm³, matrix size 96×96×48, labeling duration of 1500ms and post labeling delay (PLD) of 2000ms, 7 control-label pairs were acquired³. The perfusion images were calculated by subtraction of control and label images following quality control procedures (images with large motion artifacts or intensity variation across time points were excluded). Each perfusion image was used as the training data and the average of the 7 perfusion images were served as the reference. The top and bottom 10% of the image slices were removed for the training due to the insufficient SNR. A total of 45 subjects with repeated MRI scans were included. There was a total of 14469 slices of images for training and 672 slices for validation from this cohort.

Network and training

The training framework is shown in Figure 1. We utilized a transformer-based network called SWIN-IR⁴, which uses shifted window to improve the performance of regular vision transformer and has shown to have superior performance than CNN-based networks⁵. Random k-space Gaussian white noise was added to simulate for different SNR conditions. A 48×48 paired patch was generated for each input image during each iteration. The loss function used for the training was L1+SSIM to preserve both whole image similarity and sharpness.

Evaluation

The model performance was evaluated on three different datasets. First was a separate test dataset with the identical imaging protocol as the training dataset (2 subjects with 672 slices). The second was acquired with a different protocol with a labeling duration of 1.8s, PLD of 2s and resolution of 2.5×2.5×3 mm³ (20 subjects with test-retest scans and a total of 1610 slices). The third was a multi-delay dataset with a labeling duration of 1.8s and 5 PLDs (0.5, 1, 1.5, 2, 2.5s) (same subjects as the second dataset). The latter two datasets were also acquired on the 3T Siemens Prisma scanner. For the first two datasets, reference images were calculated by averaging all acquired time points, and similarity measurements (peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM)) were calculated from the prediction and reference. Besides, spatial SNR was calculated within the gray matter according to the definition in⁶, and temporal SNR was calculated by mean perfusion signal divided by standard deviation across time points. For the multi-delay dataset, since no reference image can be used, we compared the visual quality of perfusion images of each PLD and compared the fitted CBF and ATT maps from both the input 5 PLDs and the predicted ones to see if there’s any systematic bias for model prediction.

Results

Figure 2 shows the results for the first test dataset, the image for the DL prediction has improved SNR compared to the input. The similarities metrics are also consistent with the visual quality, SSIM and PSNR after DL denoising were improved by 2% and 7.6%, while spatial and temporal SNR were improved by 181% and 134% respectively. Figure 3 shows the results for the second dataset, SSIM and PSNR were improved by 1.4% and 3.9%, while spatial and temporal SNR were improved by 71% and 42% respectively. Figure 4 shows the results for the multi-delay dataset. DL prediction has improved the SNR for perfusion images at each PLD, and improved SNR for the fitted CBF and ATT maps. Figure 5a shows that the global CBF values calculated from the DL prediction is highly consistent with those from the single-delay input images. Figure 5b and 5c show that the fitted CBF and ATT values from the DL predicted multi-delay perfusion images are also consistent with those fitted from the input images.

Discussion and conclusion

The results from the prediction images show that the trained deep learning network has good capability to denoise ASL images and has a good generalizability to be applied to different imaging protocols with a range of PLDs. The quantitative results also show that the DL network can improve the quality of both perfusion images and the quantitative maps without introducing a systematic bias, making it possible for a wider application of multi-delay protocol instead of single delay to get more accurate CBF and ATT measurements. Further training and testing of this model on clinical datasets acquired on different vendor platforms is warranted.

Acknowledgements

No acknowledgement found.

References

[1] Wang, Danny JJ, et al. "Multi-delay multi-parametric arterial spin-labeled perfusion MRI in acute ischemic stroke—comparison with dynamic susceptibility contrast enhanced perfusion imaging." NeuroImage: Clinical 3 (2013): 1-7.

[2] Xie, Danfeng, et al. "Denoising arterial spin labeling perfusion MRI with deep machine learning." Magnetic resonance imaging 68 (2020): 95-105.

[3] Jann, Kay, et al. "Evaluation of cerebral blood flow measured by 3D PCASL as biomarker of vascular cognitive impairment and dementia (VCID) in a cohort of elderly latinx subjects at risk of small vessel disease." Frontiers in Neuroscience 15 (2021): 627627.

[4] Liang, Jingyun, et al. "Swinir: Image restoration using swin transformer." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

[5] Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

[6] Glover, Gary H., and Song Lai. "Self‐navigated spiral fMRI: interleaved versus single‐shot." Magnetic resonance in medicine 39.3 (1998): 361-368.

Figures

Figure 1. The training framework. Perfusion image of a single time point was used as the input and the average of 7 perfusion images were used as the reference. Random k space noise was added to the image to serve as data augmentation, and image patches were generated and sent to the network for training. The network used was called SWIN-IR. (RSTB: residual swin transformer block, LN: layer normalization, MSA: multi-head self-attention, MLP: multilayer perceptron)

Figure 2. The input and prediction on the test dataset (same cohort as the training data). (a) Input perfusion image and the DL denoised perfusion image of one subject. The DL denoised image has higher SNR compared to the image. (b) The quantitative results of the input and DL-predicted image. DL denoised image has higher SSIM and PSNR compared to the input image, as well as higher spatial and temporal SNR.

Figure 3. The input and prediction on the second test dataset (A different cohort from the training data). (a) Input perfusion image and the DL denoised perfusion image of one subject. The DL denoised image has higher SNR compared to the image. (b) The quantitative results of the input and DL-predicted image. DL denoised image has higher SSIM and PSNR compared to the input image, as well as higher spatial and temporal SNR.

Figure 4. The input and prediction on the multi-delay test dataset. First five rows show the input and DL denoised perfusion images of the 5 different post labeling delays. The DL denoised perfusion images have better SNR in all 5 PLDs. The bottom two rows show the fitted CBF and ATT maps from the input/prediction multi-delay perfusion images. The fitted perfusion images are quite consistent with the input image.

Figure 5. Scatter plot of global CBF and ATT for input and prediction. (a) Scatter plot of global CBF of input and prediction of the single-delay test dataset. (b) and (c) Scatter plot of global CBF and ATT of the multi-delay test dataset. Both single delay and multi-delay plot show a good consistency between the input and prediction with the DL model.

Proc. Intl. Soc. Mag. Reson. Med. 31 (2023)

2900

DOI: https://doi.org/10.58530/2023/2900