Alistair Lamb1, Anna Barnes2, Stuart A Taylor2, and Hui Zhang3
1Department of Medical Phyics and Biomedical Engineering, University College London, London, United Kingdom, 2Centre for Medical Imaging, University College London, London, United Kingdom, 3Centre for Medical Image Computing, University College London, London, United Kingdom
Synopsis
Despite
its potential as an imaging biomarker in assessing tumor response to therapy,
use of apparent diffusion coefficient (ADC) as a quantitative endpoint is not
routine in clinical practice. One factor that limits the usefulness of ADC is
the presence of artifacts in the constituent diffusion-weighted imaging (DWI) data.
In this study, we propose a supervised deep-learning
approach to detect the presence of Nyquist ghosts in axial DWI slices of the
abdomen, achieving a test accuracy of 81.5%. The detection
and removal of these artifacts could help improve the reproducibility of
quantitative ADC measurements.
Introduction
Accurate, non-invasive tumor
response biomarkers are needed for optimal patient outcomes in oncology, particularly
in the identification of metastatic disease as this typically dictates
therapeutic strategy. Despite the early promise of apparent diffusion
coefficient (ADC) as an endpoint quantitative biomarker in assessing tumor
response to therapy, it is not routine in clinical practice. Although, in
theory, ADCs should be consistent across scans at different visits or from
different subjects, we know this not to be the case in practice. Donati et
al. reported significant intervendor differences
of estimated ADC in the upper abdominal organs, with coefficients of variations
ranged from 7.0% to 27.1%1. One factor that can affect ADC estimates
is the presence of artifacts in the constituent diffusion-weighted imaging
(DWI) images typical of EPI readouts, with one of the most prevalent in
whole-body DWI (WB-DWI) being Nyquist ghosts (affecting >40% of slices in our dataset). The presence of these
artifacts can lead to unwanted changes in intensity in tissues of interest,
which affect the estimated ADC and clinical decisions made with regard to the patient’s
treatment. Currently, there is no method for automated detection of Nyquist
ghosts in WB-DWI, meaning they have to be identified manually which is a time-consuming
process. The aim of this work is to develop a supervised deep-learning approach
that enables the automatic detection of WB-DWI data corrupted by the presence
of Nyquist ghosts.Method
Transfer
learning-based CNN
A standard transfer-learning
approach2 was adopted, shown in Figure 1, using VGG-163
pretrained on ImageNet4, available from TensorFlow5. The
pre-existing classifier was replaced with a new classifier, consisting of a 128-unit
dense layer followed by a single node with a sigmoid output. During training, the
weights of the convolutional base were frozen so that only the classifier was trained,
allowing the transferal of features learnt on ImageNet.
Dataset
DWI volumes
corresponding to the abdominal region from 22 subjects were taken from the
Streamline dataset6,7. Each volume consisted of thirty 256x256 pixel
slices of similar resolutions. The first and last slice from each volume were
not used as they had been masked by the scanner. This left a total of 616 DWI
slices which, following careful visual inspections, were assigned a binary
label indicating whether they contained Nyquist ghosting. The percentage of slices containing Nyquist ghosting is shown for each subject
in Figure 2. All data were
collected using a GE
Discovery MR750w 3.0T scanner.
Data
pre-processing
Initially, no
pre-processing was performed on the images prior to training. A second pipeline
was then constructed, where the images were windowed between intensities of 0
and 25, as shown in Figure 3. This was done for two reasons: (i) the
maximum intensity of each slice varied considerably (75-6430), and the
intensity of the Nyquist ghosts were typically <100; (ii) windowing the
intensity values below 25 we hypothesize may help to prevent the network from
overfitting to irrelevant features, as the center of the slices, which
typically have intensity values >25, ended up with homogenous intensity
values, effectively removing much of the corresponding features that are
irrelevant for detecting Nyquist ghosting.
Training and
testing
It is established that
class imbalance can have a significant detrimental effect on training classifiers8.
As there is a large variance in the amount of Nyquist ghosting per
subject in our dataset, we paired subjects with the highest percentage with
subjects with the lowest percentage to try limit the class imbalance in the
training set during cross-validation (see Figure 4). An 11-fold
cross-validation approach was therefore utilized to train and test the network.
Each iteration, 18 subjects were used for training, two for validation (using early
stopping to prevent overfitting to the training data9), and two to
test the resulting model. The early stopping was set up so that if there was no
improvement to the validation loss after five epochs the training was stopped. The
accuracy was reported as the percentage of test data assigned the correct
classification label on the final model of each iteration. To reduce
overfitting and improve generalization error, dropout
regularization (rate = 0.5) was applied to the dense layer of the classifier
during training10. The network was trained using the
Adam optimizer with typical parameter values11 (learning rate = 1x10-4, beta1 = 0.9, beta2 = 0.999) and a batch size of 8. The loss
function used was binary cross-entropy.Results and Discussion
A model with a
mean test accuracy of 81.5% and standard deviation (SD) of 7.5%,
was achieved by training on DWI slices windowed between intensities of 0 and 25.
Without windowing, the network achieved a predicted accuracy of 73.1% (SD =
7.8%). A paired t-test found the
difference in the mean test accuracies to be significant; t(11) =
2.764, p = 0.0184. The test accuracies across all 11 cross
validation iterations are shown in Figure 5.Conclusion
Experiments on the
Streamline dataset demonstrated the feasibility of a transfer-learning approach
to automatically detect Nyquist ghosts in WB-DWIs. Detection of artifacts
whilst the patient is still on the scanner could allow for adjustment of
parameters in order to remove them, allowing for a more accurate measure of the
ADC.Acknowledgements
This work is made possible in part by the
contributions from the Streamline investigators (https://www.thelancet.com/cms/10.1016/S2468-1253(19)30056-1/attachment/0f9810e4-38aa-4a64-a363-6dfb1178f996/mmc1.pdf).References
-
Donati
OF, Chong D, Nanz D, Boss A, Froehlich JM, Andres E, Seifert B, Thoeny HC.
Diffusion-weighted MR imaging of upper abdominal organs: field strength and
intervendor variability of apparent diffusion coefficients. Radiology. 2014
Feb;270(2):454-63. doi:
10.1148/radiol.13130819. Epub 2013 Nov 5. PMID: 24471390.
- Géron, A.
Deep Computer Vision Using Convolutional Neural Networks. In: Roumeliotis, R,
Tache, N (eds.) Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow. Canada: O'Reilly Media, Inc; 2019487. p. 481-483.
- Simonyan
K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image
Recognition. International Conference on Learning Representations. 2015
- Deng J, Dong W, Socher R, et al.
Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on
computer vision and pattern recognition. 2009. p. 248–55.
- Abadi M, Agarwal A, Barham P,
Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S.
Tensorflow: Large-scale machine learning on heterogeneous distributed systems.
arXiv preprint arXiv:1603.04467. 2016 Mar 14.
- Taylor SA, Mallett S, Ball S, et
al. Diagnostic accuracy of whole-body MRI versus standard imaging pathways for
metastatic disease in newly diagnosed non-small-cell lung cancer: the
prospective Streamline L trial. Lancet Respir Med. 2019 Jun;7(6):523-532. doi:
10.1016/S2213-2600(19)30090-6. Epub 2019 May 9. PMID: 31080129; PMCID:
PMC6529610.
- Taylor SA, Mallett S, Beare S, et
al. Diagnostic accuracy of whole-body MRI versus standard imaging pathways for
metastatic disease in newly diagnosed colorectal cancer: the prospective
Streamline C trial. Lancet Gastroenterol Hepatol. 2019 Jul;4(7):529-537. doi:
10.1016/S2468-1253(19)30056-1. Epub 2019 May 9. PMID: 31080095; PMCID:
PMC6547166.
-
Japkowicz
N, Stephen S. The class imbalance problem: A systematic study. Intelligent data
analysis. 2002 Jan 1;6(5):429-49.
- Prechelt L. (2012) Early Stopping
— But When?. In: Montavon G., Orr G.B., Müller KR. (eds) Neural Networks: Tricks
of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_5
- Nitish Srivastava, Geoffrey
Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014.
Dropout: a simple way to prevent neural networks from overfitting. J. Mach.
Learn. Res. 15, 1 (January 2014), 1929–1958.
- Bengio Y. Practical
recommendations for gradient-based training of deep architectures. InNeural
networks: Tricks of the trade 2012 (pp. 437-478). Springer, Berlin, Heidelberg.