0374

High-precision neural-network discrimination of human plasma samples to detect pancreatic cancer using specialized data-augmentation method
Meiyappan Solaiyappan1, Santosh Kumar Bharti1, Paul T Winnard Jr1, Mohamad Dbouk2, Michael G Goggins2,3,4, and Zaver M Bhujwalla1,3,5
1Department of Radiology, The Johns Hopkins University School of Medicine, Baltimore, MD, United States, 2Department of Pathology, The Johns Hopkins University School of Medicine, Baltimore, MD, United States, 3Department of Oncology, The Johns Hopkins University School of Medicine, Baltimore, MD, United States, 4Department of Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, United States, 5Department of Radiation Oncology and Molecular Radiation Sciences, The Johns Hopkins University School of Medicine, Baltimore, MD, United States

Synopsis

The insidious growth of pancreatic cancer is a major factor contributing to its lethality. Only ~20% of pancreatic cancers are resectable by the time they are detected. Early detection of pancreatic cancer through routine screening is clearly an unmet clinical need. Here we have applied neural-network analysis to 1H magnetic resonance spectra of human plasma samples to differentiate between healthy subjects (control), subjects with benign lesions, and subjects with pancreatic ductal adenocarcinoma (PDAC). Our data support developing a neural-network approach to identify PDAC from 1H MRS of plasma samples.

Introduction

Pancreatic ductal adenocarcinoma (PDAC) is the most frequent form of pancreatic cancer and its low survival rate of less than 9% at five years makes it the fourth leading cause of cancer-related deaths. The poor prognosis of PDAC is mainly due to late-stage diagnosis [1]. Although inroads are being made in developing molecular imaging probes, these have not been clinically translated. There is an urgent need for noninvasive clinically translatable biomarkers of PDAC. Plasma based tests provide a desirable option for routine screening. Here we have evaluated the application of neural-network analysis to 1H MR spectra of human plasma samples to identify PDAC that could potentially be used for screening, initially, in high-risk patients.

Methods

Plasma samples from healthy subjects (control, n=56), from subjects with benign pancreatic lesions (benign, n=49), and from subjects with PDAC (PDAC/malignant, n=53) were analyzed with 1H MRS. 1H MR spectra were acquired on a Bruker Avance III 750 MHz (17.6 T) MR spectrometer equipped with a 5 mm probe. Plasma (250µL) was diluted with D2O buffer (350µL) and spectra with water suppression were achieved using pre-saturation and were acquired using a single pulse sequence with the following experimental parameters: spectral width of 15495.86 Hz, data points of 64 K, 90o flip angle, relaxation delay of 10 s, acquisition time 2.11 s, 64 scans with 8 dummy scans, receiver gain 64. All spectral acquisition, processing and quantification were performed using TOPSPIN 3.5 software. Area under peaks were integrated and normalized with respect to reference signal. Representative spectra from the three groups are presented in Figure 1.

After the initial processing of the spectral data to calibrate against the reference peak signal and the plasma volume quantity, mean spectra for each classification group and the differences of the mean spectra of benign and malignant from the control were calculated to identify segments of the spectra that exhibit significant differences to provide a set of key target locations in the spectra. The spectral differences with respect to the mean of the control spectra at each of these target locations were computed to construct a feature vector. This feature vector was used as the input variable for the neural network analysis to discriminate the three classes. With original sample size of 158 cases and with nearly equal proportion, we were able to achieve a discrimination accuracy in the range of 83% to 86%. While this level of accuracy is already in the desired accuracy range, to further enhance the accuracy and validate the robustness of the approach, we applied a specialized data augmentation technique to increase the data size. We developed and applied Variational Auto-Encoder (VAE) neural network approach that uses a Gaussian distribution model to fit the distribution of feature-vector for each class and generate new artificial feature-vector samples for each class, to closely resemble the ones from the original samples of the class. Using this specialized data-augmentation technique, we expanded our original data-size of 158 feature-vector samples to 314, making it relatively more suitable for meeting the demands of a neural-network technique that otherwise may result in overfitting. To further reduce overfitting, we used a minimal neural network with one hidden layer and optimized the training using a data-subdivision approach that randomly divides the dataset into subsets of 70%-15%-15% proportions for training, validation and testing to achieve a more balanced fine-tuning of the parameters of the network and eliminate any bias in the samples toward one class that may otherwise skew the accuracy. All the artificial neural-network functions were developed in MATLAB 2020b (MathWorks, Inc).

Results

The performance of the neural network to successfully discriminate the three classes is illustrated in Figure 2(a). The output from the hidden-layer of the neural-network was mapped onto a two-dimensional embedding and visualized as a scatter plot to illustrate the neural network’s ability to successfully encode the feature vectors into well-separated clusters of the classes with minimal overlap. This approach provided the basis for the high sensitivity, specificity, and precision accuracy presented in the receiver operating characteristics (ROC) curves in Figure 2(b) and the confusion matrix plot in Figure 3.

Discussion

We have demonstrated that a combination of spectral features extraction and neural network processing of MRS data of plasma samples can be used to successfully discriminate between control, benign and malignant pancreatic cancer. Further, we have shown that high precision accuracy with the available size of precious-to-obtain data can be achieved using a specialized data augmentation technique to expand the data size.

Acknowledgements

This work was supported by NIH R35CA209960, R01CA193365, and U01CA210170. We thank Dr. Karen Horton for her support.

References

  1. Blackford AL, Canto MI, Klein AP, Hruban RH, Goggins M: Recent trends in the incidence and survival of Stage 1A Pancreatic Cancer: A Surveillance, Epidemiology, and End Results analysis. J Natl Cancer Inst., 2020, 112:1162-9.

Figures

Figure 1: Representative 1H MRS spectra obtained from plasma of healthy subjects (control/normal), patients with benign disease and PDAC patients. Expansion of the spectra from 2.2 ppm to 2.5ppm are 4X vertically zoomed highlighting changes in metabolic patterns. (BHB; betahydroxybutyrate, BCA; branch chain amino acid, PUFA; Polyunsaturated fatty acid).

Figure 2: (a) The scatter-plot shows the 2D embedding of the neural network’s hidden layer output, to illustrate well separated clustering of control (green), benign (blue), and malignant (red) samples with very little overlap. The clustering performance provides a visual understanding of the high precision accuracy of discrimination obtained in the final output results. (b) Receiver Operating Characteristics (ROC) curves show the sensitivity and specificity performance of the neural-network, with the area under the curve (AUC) for all three classifications above 0.95.

Figure 3: Confusion Matrix result of cancer plasma prediction. The green diagonal boxes show the correct predictions in each class and red boxes indicate misclassifications. The numbers in each box correspond to the number of samples (and their percentage of the total data). The right-most column shows the precision for each predicted class (in green). The bottom-row shows prediction accuracy for each class (in green) and the bottom-right corner box shows the overall accuracy (in green) and error rate (in red). Cancer plasma classification resulted in an 95.2% correct prediction.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)
0374