Meiyappan Solaiyappan1, Santosh Kumar Bharti1, Paul T Winnard Jr1, Mohamad Dbouk2, Michael G Goggins2,3,4, and Zaver M Bhujwalla1,3,5
1Department of Radiology, The Johns Hopkins University School of Medicine, Baltimore, MD, United States, 2Department of Pathology, The Johns Hopkins University School of Medicine, Baltimore, MD, United States, 3Department of Oncology, The Johns Hopkins University School of Medicine, Baltimore, MD, United States, 4Department of Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, United States, 5Department of Radiation Oncology and Molecular Radiation Sciences, The Johns Hopkins University School of Medicine, Baltimore, MD, United States
Synopsis
The insidious growth of pancreatic cancer is a major
factor contributing to its lethality.
Only ~20% of pancreatic cancers are resectable by the time they are
detected. Early detection of pancreatic
cancer through routine screening is clearly an unmet clinical need. Here
we have applied neural-network analysis to 1H magnetic resonance
spectra of human plasma samples to differentiate between healthy subjects
(control), subjects with benign lesions, and subjects with pancreatic ductal
adenocarcinoma (PDAC). Our data support
developing a neural-network approach to identify PDAC from 1H MRS of
plasma samples.
Introduction
Pancreatic
ductal adenocarcinoma (PDAC) is the most frequent form of pancreatic cancer and
its low survival rate of less than 9% at five years makes it the fourth leading
cause of cancer-related deaths. The poor prognosis of PDAC
is mainly due to late-stage diagnosis [1]. Although inroads are being made in developing
molecular imaging probes, these have
not been clinically translated. There is an
urgent need for noninvasive clinically translatable biomarkers of PDAC. Plasma based tests provide a desirable option
for routine screening. Here we have
evaluated the application of neural-network analysis to 1H MR
spectra of human plasma samples to identify PDAC that could potentially be used
for screening, initially, in high-risk patients.Methods
Plasma samples from healthy
subjects (control, n=56), from subjects with benign pancreatic lesions (benign,
n=49), and from subjects with PDAC (PDAC/malignant, n=53) were analyzed with 1H
MRS. 1H MR spectra were
acquired on a Bruker Avance III 750 MHz (17.6 T) MR spectrometer equipped with
a 5 mm probe. Plasma (250µL) was diluted
with D2O buffer (350µL) and spectra with water suppression were
achieved using pre-saturation and were acquired using a single pulse sequence
with the following experimental parameters: spectral width of 15495.86 Hz, data
points of 64 K, 90o flip angle, relaxation delay of 10 s,
acquisition time 2.11 s, 64 scans with 8 dummy scans, receiver gain 64. All spectral acquisition, processing and
quantification were performed using TOPSPIN 3.5 software. Area under peaks were integrated and
normalized with respect to reference signal. Representative spectra from the
three groups are presented in Figure 1.
After the initial processing of the spectral
data to calibrate against the reference peak signal and the plasma volume
quantity, mean spectra for each classification group and the differences of the
mean spectra of benign and malignant from the control were calculated to identify
segments of the spectra that exhibit significant differences to provide a set
of key target locations in the spectra. The spectral differences with respect
to the mean of the control spectra at each of these target locations were
computed to construct a feature vector. This feature vector was used as the
input variable for the neural network analysis to discriminate the three
classes. With original sample size of 158 cases and with nearly equal
proportion, we were able to achieve a discrimination accuracy in the range of
83% to 86%. While this level of accuracy is already in the desired accuracy
range, to further enhance the accuracy and validate the robustness of the
approach, we applied a specialized data augmentation technique to increase the data
size. We developed and applied Variational Auto-Encoder (VAE) neural network
approach that uses a Gaussian distribution model to fit the distribution of
feature-vector for each class and generate new artificial feature-vector
samples for each class, to closely resemble the ones from the original samples
of the class. Using this specialized data-augmentation technique, we expanded
our original data-size of 158 feature-vector samples to 314, making it
relatively more suitable for meeting the demands of a neural-network technique that
otherwise may result in overfitting. To further reduce overfitting, we used a
minimal neural network with one hidden layer and optimized the training using a
data-subdivision approach that randomly divides the dataset into subsets of 70%-15%-15%
proportions for training, validation and testing to achieve a more balanced
fine-tuning of the parameters of the network and eliminate any bias in the samples
toward one class that may otherwise skew the accuracy. All the artificial neural-network
functions were developed in MATLAB 2020b (MathWorks, Inc).Results
The
performance of the neural network to successfully discriminate the three
classes is illustrated in Figure 2(a). The
output from the hidden-layer of the neural-network was mapped onto a two-dimensional
embedding and visualized as a scatter plot to illustrate the neural network’s
ability to successfully encode the feature vectors into well-separated clusters
of the classes with minimal overlap. This
approach provided the basis for the high sensitivity, specificity, and precision
accuracy presented in the receiver operating characteristics (ROC) curves in
Figure 2(b) and the confusion matrix plot in Figure 3.Discussion
We have demonstrated that a
combination of spectral features extraction and neural network processing of
MRS data of plasma samples can be used to successfully discriminate between
control, benign and malignant pancreatic cancer. Further, we have shown that high
precision accuracy with the available size of precious-to-obtain data can be
achieved using a specialized data augmentation technique to expand the data
size.Acknowledgements
This
work was supported by NIH R35CA209960, R01CA193365, and U01CA210170.
We
thank Dr. Karen Horton for her support.References
- Blackford AL, Canto MI, Klein AP, Hruban RH,
Goggins M: Recent trends in the incidence and survival of Stage 1A Pancreatic
Cancer: A Surveillance, Epidemiology, and End Results analysis. J Natl Cancer
Inst., 2020, 112:1162-9.