Jong Bum Son1, Ken-Pin Hwang1, David E Rauch1, Ju Hee Ahn2, Jiyoung Lee3, Zijian Zhou1, Bikash Panthi4, Beatriz Adrada4, Rosalind P Candelaria4, Jason B White5, Mary Guirguis4, Rania M Mohamed6, Elizabeth E Ravenberg7, Clinton Yam7, Debasish Tripathy7, and Jingfei Ma1
1Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, 2Radiology Department, GangNam Radiology Clinic, Busan, Korea, Republic of, 3College of Medicine, Chung-Ang University, Seongnam-si, Korea, Republic of, 4Department of Breast Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, 5Department of Moon Shots Operations, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, 6Department of Breast Imaging Research, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, 7Department of Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
Synopsis
We developed a hybrid deep learning network combining convolutional neural (CNN) and long short-term memory (LTSM) networks to predict slice-to-slice consistent responses to neoadjuvant systematic therapy (NAST) in triple negative breast cancer (TNBC) patients using multislice quantitative SyntheticMR images. We demonstrated that neural networks originally developed for video feature classification can be adapted to predict treatment response of cancer patients using MR images. Our hybrid network was able to overcome the slice-to-slice inconsistency that would have resulted if a 2D network is applied directly, therefore providing higher prediction accuracy.
Introduction
Convolutional neural networks (CNN) with densely connected multilayer perceptrons can learn unique features of an image and classify objects in the image (e.g., vehicles, keyboard, mouse, pencil, and animals).1-3 In medical imaging, CNN also holds promise of extracting image features to predict response to treatment in cancer patients.4-6 However, applying 2D CNN1-3 to an individual slice ignores the intrinsic slice-to-slice correlation of a tumor and may lead to inconsistent predictions. In contrast, 3D CNN7 is incompatible with a varying number of slices due to the varying size and location of a tumor. In this work, we propose a hybrid deep learning network to address this limitation and demonstrate its application for prediction of responses to neoadjuvant systematic therapy (NAST) in triple negative breast cancer (TNBC) patients using multislice quantitative SyntheticMR images.Methods
Multi-slice T1, T2 and proton-density (PD) maps were generated from SyntheticMR8-9 images (Fig. 1a). For each patient dataset, a cuboid encompassing the tumor identified by a radiologist was extracted (Fig. 1a-b) and used as the network input (Fig. 1c). Our hybrid network preforms the network training and testing in 3 sequential steps: (1) network training with a binary ResNet-5010 (Fig. 2), (2) network training with a binary long short-term memory (LSTM) network11 (Fig. 2), and (3) testing with combined binary ResNet-50 and LSTM networks (Fig. 3).
(1) Network training with a binary ResNet-50
We modified the last three layers (the fully connected, soft max, and classification layers) of the pretrained ResNet-5010 to make a binary decision whether the patient has pathological complete response (pCR) or non-pCR (Fig. 2). The reference standard for pCR or non-pCR was established by histopathology at surgery. In this step, 2D syntheticMR images with 3 image channels (T1 in red, T2 in green, and PD in blue channels) were used.
(2) Network training with a binary LSTM network
In this step, activation maps were extracted from the global average pooling layer of the binary ResNet-50 (Fig. 2), then used as the training inputs to the binary LSTM11 network to predict if the patient has pCR or non-pCR (Fig. 3).
(3) Network testing with a hybrid network of binary ResNet-50 and LSTM networks
After Steps 1-2, the last three layers (fully connected, soft-max, and classification output) were removed from the binary ResNet-50, then cloned to multiple self-replicas, then combined with a series of recursively connected binary LSTM networks to compose a single hybrid network. The number of the self-replicas is selected to match the number of the slices in the initial input of cuboid. A sequence of activation maps from all the slices were used to produce a single output of pCR or non-pCR (Fig. 4).Experiments
We employed the images from a total of 125 patients for this study. All syntheticMR images were acquired on a GE 3.0-T MR750w whole body scanner (GE Healthcare, Waukesha, WI, USA) with an 8-channel phased array breast coil. The scan parameters for SyntheticMR were FOV = 34 cm × 34 cm, matrix = 320 × 256, slice-thickness/slice-gap = 4/1mm, Nslice = 30, TR = 4500 ms, TE1/TE2 = 18/93 ms, RBW = ±31.25 kHz, ETL = 14, ASSET acceleration factor = 2, and scan-time = 6 min 20 secs. The patient images were randomly split into three groups: 104 for training (83%), 4 for validation (3%), and 17 for testing (14%). We implemented the hybrid network on a DGX1 system with a Tesla V100 32GB GPU (NVIDIA, Santa Clara, CA, USA). We trained both binary ResNet-50 and LSTM networks for 100 epochs with an adaptive moment estimation optimizer (mini batch for normalization = 100, 𝛽1 = 0.9, and 𝛽2 = 0.999).Results
The range for the number of slices of the tumor containing cuboids is 2~11 and 2~8 for the training and testing datasets, respectively. After Steps 1-2, performance of our hybrid network was evaluated in the testing set (Fig. 5). The 2D ResNet-50 was found to have inconsistent slice-to-slice predictions in 6 patients with an overall accuracy of 59% (Fig. 5a). In comparison, our hybrid network had the correct predictions in 16 patients with an overall accuracy of 94% (Fig. 5b). For the only failure case, the error was found to have originated from the 1st step by binary ResNet-50.Conclusion
We demonstrated that neural networks originally developed for video feature classification12 can be adapted to predict treatment response of cancer patients using MR images. Our hybrid network was able to overcome the slice-to-slice inconsistency that would have resulted if a 2D network is applied directly, therefore providing much higher prediction accuracy. In addition, our hybrid network structure may include other images (e.g., DWI and DCE images) to make a consistent prediction for the clinical outcome from a 3D or even 4D image dataset.Acknowledgements
No acknowledgement found.References
1. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012:1097-1105.
2. Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015:1-9.
3. Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2014:818-833.
4. Jin C, Yu H, Ke J, et al. Predicting treatment response from longitudinal images using multi-task deep learning. Nat Commun. 2021;12:1851.
5. Xu Y, Hosny A, Zeleznik R, et al. Deep learning predicts lung cancer treatment response from serial medical imaging. Clin Cancer Res. 2019;25(11):3266-3275.
6. Ha R, Chin C, Karcich J, et al. Prior to Initiation of Chemotherapy, can we predict breast tumor response? Deep learning convolutional neural networks approach using a breast MRI tumor dataset. J Digit Imaging. 2019;32(5):693-701.
7. Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell. 2013;35(1):221-231.
8. Warntjes J, Dahlqvist O, Lundberg P. Novel method for rapid, simultaneous T1, T2*, and proton density quantification. Magn Reson Med. 2007;57:528-537.
9. Krauss W, Gunnarsson M, Andersson T, et al. Accuracy and reproducibility of a quantitative magnetic resonance imaging method for concurrent measurements of tissue relaxation times and proton density. Magn Reson Imaging. 2015;33:584-591.
10. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2016:770-778.
11. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735-1780.
12. Ng JY, Hausknecht M, Vijayanarasimhan S, et al. Beyond short snippets: deep networks for video classification. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015:4694-4702.