2445

Automatic Detection of Nyquist Ghosts in Whole-Body Diffusion Weighted MRI Using Deep Learning

Alistair Lamb¹, Anna Barnes², Stuart A Taylor², and Hui Zhang³
¹Department of Medical Phyics and Biomedical Engineering, University College London, London, United Kingdom, ²Centre for Medical Imaging, University College London, London, United Kingdom, ³Centre for Medical Image Computing, University College London, London, United Kingdom

Synopsis

Despite its potential as an imaging biomarker in assessing tumor response to therapy, use of apparent diffusion coefficient (ADC) as a quantitative endpoint is not routine in clinical practice. One factor that limits the usefulness of ADC is the presence of artifacts in the constituent diffusion-weighted imaging (DWI) data. In this study, we propose a supervised deep-learning approach to detect the presence of Nyquist ghosts in axial DWI slices of the abdomen, achieving a test accuracy of 81.5%. The detection and removal of these artifacts could help improve the reproducibility of quantitative ADC measurements.

Introduction

Accurate, non-invasive tumor response biomarkers are needed for optimal patient outcomes in oncology, particularly in the identification of metastatic disease as this typically dictates therapeutic strategy. Despite the early promise of apparent diffusion coefficient (ADC) as an endpoint quantitative biomarker in assessing tumor response to therapy, it is not routine in clinical practice. Although, in theory, ADCs should be consistent across scans at different visits or from different subjects, we know this not to be the case in practice. Donati et al. reported significant intervendor differences of estimated ADC in the upper abdominal organs, with coefficients of variations ranged from 7.0% to 27.1%¹. One factor that can affect ADC estimates is the presence of artifacts in the constituent diffusion-weighted imaging (DWI) images typical of EPI readouts, with one of the most prevalent in whole-body DWI (WB-DWI) being Nyquist ghosts (affecting >40% of slices in our dataset). The presence of these artifacts can lead to unwanted changes in intensity in tissues of interest, which affect the estimated ADC and clinical decisions made with regard to the patient’s treatment. Currently, there is no method for automated detection of Nyquist ghosts in WB-DWI, meaning they have to be identified manually which is a time-consuming process. The aim of this work is to develop a supervised deep-learning approach that enables the automatic detection of WB-DWI data corrupted by the presence of Nyquist ghosts.

Method

Transfer learning-based CNN

A standard transfer-learning approach² was adopted, shown in Figure 1, using VGG-16³ pretrained on ImageNet⁴, available from TensorFlow⁵. The pre-existing classifier was replaced with a new classifier, consisting of a 128-unit dense layer followed by a single node with a sigmoid output. During training, the weights of the convolutional base were frozen so that only the classifier was trained, allowing the transferal of features learnt on ImageNet.

Dataset

DWI volumes corresponding to the abdominal region from 22 subjects were taken from the Streamline dataset^6,7. Each volume consisted of thirty 256x256 pixel slices of similar resolutions. The first and last slice from each volume were not used as they had been masked by the scanner. This left a total of 616 DWI slices which, following careful visual inspections, were assigned a binary label indicating whether they contained Nyquist ghosting. The percentage of slices containing Nyquist ghosting is shown for each subject in Figure 2. All data were collected using a GE Discovery MR750w 3.0T scanner.

Data pre-processing

Initially, no pre-processing was performed on the images prior to training. A second pipeline was then constructed, where the images were windowed between intensities of 0 and 25, as shown in Figure 3. This was done for two reasons: (i) the maximum intensity of each slice varied considerably (75-6430), and the intensity of the Nyquist ghosts were typically <100; (ii) windowing the intensity values below 25 we hypothesize may help to prevent the network from overfitting to irrelevant features, as the center of the slices, which typically have intensity values >25, ended up with homogenous intensity values, effectively removing much of the corresponding features that are irrelevant for detecting Nyquist ghosting.

Training and testing

It is established that class imbalance can have a significant detrimental effect on training classifiers⁸. As there is a large variance in the amount of Nyquist ghosting per subject in our dataset, we paired subjects with the highest percentage with subjects with the lowest percentage to try limit the class imbalance in the training set during cross-validation (see Figure 4). An 11-fold cross-validation approach was therefore utilized to train and test the network. Each iteration, 18 subjects were used for training, two for validation (using early stopping to prevent overfitting to the training data⁹), and two to test the resulting model. The early stopping was set up so that if there was no improvement to the validation loss after five epochs the training was stopped. The accuracy was reported as the percentage of test data assigned the correct classification label on the final model of each iteration. To reduce overfitting and improve generalization error, dropout regularization (rate = 0.5) was applied to the dense layer of the classifier during training¹⁰. The network was trained using the Adam optimizer with typical parameter values¹¹ (learning rate = 1x10^-4, beta1 = 0.9, beta2 = 0.999) and a batch size of 8. The loss function used was binary cross-entropy.

Results and Discussion

A model with a mean test accuracy of 81.5% and standard deviation (SD) of 7.5%, was achieved by training on DWI slices windowed between intensities of 0 and 25. Without windowing, the network achieved a predicted accuracy of 73.1% (SD = 7.8%). A paired t-test found the difference in the mean test accuracies to be significant; t(11) = 2.764, p = 0.0184. The test accuracies across all 11 cross validation iterations are shown in Figure 5.

Conclusion

Experiments on the Streamline dataset demonstrated the feasibility of a transfer-learning approach to automatically detect Nyquist ghosts in WB-DWIs. Detection of artifacts whilst the patient is still on the scanner could allow for adjustment of parameters in order to remove them, allowing for a more accurate measure of the ADC.

Acknowledgements

This work is made possible in part by the contributions from the Streamline investigators (https://www.thelancet.com/cms/10.1016/S2468-1253(19)30056-1/attachment/0f9810e4-38aa-4a64-a363-6dfb1178f996/mmc1.pdf).

References

Donati OF, Chong D, Nanz D, Boss A, Froehlich JM, Andres E, Seifert B, Thoeny HC. Diffusion-weighted MR imaging of upper abdominal organs: field strength and intervendor variability of apparent diffusion coefficients. Radiology. 2014 Feb;270(2):454-63. doi: 10.1148/radiol.13130819. Epub 2013 Nov 5. PMID: 24471390.
Géron, A. Deep Computer Vision Using Convolutional Neural Networks. In: Roumeliotis, R, Tache, N (eds.) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. Canada: O'Reilly Media, Inc; 2019487. p. 481-483.
Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations. 2015
Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. 2009. p. 248–55.
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. 2016 Mar 14.
Taylor SA, Mallett S, Ball S, et al. Diagnostic accuracy of whole-body MRI versus standard imaging pathways for metastatic disease in newly diagnosed non-small-cell lung cancer: the prospective Streamline L trial. Lancet Respir Med. 2019 Jun;7(6):523-532. doi: 10.1016/S2213-2600(19)30090-6. Epub 2019 May 9. PMID: 31080129; PMCID: PMC6529610.
Taylor SA, Mallett S, Beare S, et al. Diagnostic accuracy of whole-body MRI versus standard imaging pathways for metastatic disease in newly diagnosed colorectal cancer: the prospective Streamline C trial. Lancet Gastroenterol Hepatol. 2019 Jul;4(7):529-537. doi: 10.1016/S2468-1253(19)30056-1. Epub 2019 May 9. PMID: 31080095; PMCID: PMC6547166.
Japkowicz N, Stephen S. The class imbalance problem: A systematic study. Intelligent data analysis. 2002 Jan 1;6(5):429-49.
Prechelt L. (2012) Early Stopping — But When?. In: Montavon G., Orr G.B., Müller KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_5
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (January 2014), 1929–1958.
Bengio Y. Practical recommendations for gradient-based training of deep architectures. InNeural networks: Tricks of the trade 2012 (pp. 437-478). Springer, Berlin, Heidelberg.

Figures

Figure 1: The convolutional base of VGG-16 is composed of a stack of convolutional and max-pooling layers for the purpose of feature extraction. The classifier is composed of fully connected layers. The classifier of the pretrained VGG-16 network was removed, leaving the feature extraction components of the network to which we added a new classifier (a 128 unit dense layer followed by a single output node). The weights of the convolutional base were frozen so we could reuse the features learnt from ImageNet.

Figure 2: The percentage of slices containing Nyquist ghosting is shown for each subject, as is the mean (44.2%) and standard deviation (25.0%) across all subjects.

Figure 3: A axial DWI slice with intensities windowed between 0-255 (left), and 0-25 (right). The Nyquist ghosting leads to signal from the kidneys appearing anterior to the subject’s abdomen. This is clearer in the right image. It can be seen that windowing the intensities between 0-25 removed much of the features of the internal structures.

Figure 5: The percentage of slices containing Nyquist ghosts is shown for each pair of subjects. For each subject pair, the percentage for the constituent subjects are also shown, labelled with the subject number. The mean (44.2%) across all subject pairs is given, along with the standard deviation (5.3%) which is much lower than that of slices containing Nyquist Ghosting in individual subjects, as shown in Figure 2.

Figure 5: The percentage accuracy of the classifier on test data for each of the 11 cross-validation iterations is shown, for both the network trained on DWI slices with intensity values windowed between 0-25, and for the network trained without windowing. The mean accuracy and corresponding standard deviation across all 11 iterations is also shown for both cases.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

2445