2316

Artificial Intelligence Prediction of Breast Cancer Pathologic Complete Response from Axillary Lymph Node MRIs

Janice Yang^1,2, Thomas Ren¹, Hongyi Duanmu³, Pauline Huang¹, Renee Cattell¹, Haifang Li¹, Fusheng Wang³, and Tim Q Duong¹
¹Radiology, Stony Brook University, Stony Brook, NY, United States, ²Dougherty Valley High School, San Ramon, CA, United States, ³Computer Science, Stony Brook University, Stony Brook, NY, United States

Synopsis

Breast cancer patient response to neoadjuvant chemotherapy cannot be accurately predicted or monitored through imaging, leading to unnecessary treatment and sentinel lymph node biopsies. We developed convolutional neural networks to predict pathologic complete response utilizing a combination of axillary lymph node MRIs from before and during treatment. 3-fold cross validation reveals that the model trained on scans before and after the first cycle of neoadjuvant chemotherapy performed best with an accuracy of 81.17%. These results point to improved predictive performance of early imaging markers in axillary lymph nodes and encourages its implementation to aid treatment planning and improve prognosis.

Introduction

Assessment of response to neoadjuvant chemotherapy (NAC) is essential for treatment planning and prognosis of breast cancer. Pathologic complete response (pCR), the absence of residual invasive disease in the breast or axillary lymph nodes (aLNs) after NAC, is determined through biopsy for nodal assessment and through surgical pathology, but are invasive and have significant side effects¹.

In contrast to aLN biopsy, MRI has the potential to visualize all aLNs non-invasively, in 3-dimensions, and in-situ. Disease-induced changes in the aLNs are usually subtle given their small size, and are challenging to detect with limits of MRI resolution and contrast. Current non-invasive radiological staging based on MRI does not have the needed accuracy for clinical application, and a reliable method has the potential to obviate the need for surgical biopsies, predict response to chemotherapy prior to surgical intervention, and guide treatment.

Artificial intelligence is gaining popularity for analyzing diagnostic images². A common machine learning algorithm is a convolutional neural network (CNN), which takes input images, learns important features such as size or intensity, and saves these parameters as weights and biases to differentiate between classes³. To our knowledge, this is the first use of CNNs to compare the predictions of pCR from aLNs throughout NAC.

We tested the hypothesis that CNNs using early changes in aLN characteristics based on MRIs can accurately predict pCR to NAC.

Methods

The MRI data from this study was obtained from a subset of the American College of Radiology Imaging Network 6657 study with I-SPY 1 TRIAL⁴. The aLNs were segmented in 3D on the first post-contrast MRI (1.5T) and cropped to a size of 64x64x20 for computational efficiency. MRIs were acquired pre-NAC (timepoint 1, or TP1), after 1 cycle of NAC (TP2), and prior to surgery (TP4).

Because nodes with small volume at TP1 held little information and likely had no disease, we compared thresholds (see Table 1) using the mean and standard deviations of normal node volume as reference from patients at the Stony Brook University Hospital. Sample sizes are summarized in Table 2 and examples of aLNs are shown in Figure 1.

The predictive performance was compared between three CNNs: based on TP1 data, a combination of TP2 and TP1 data, and a combination of TP4 and TP1 data. In the models where two timepoints were used, the CNN learned to associate data from two timepoints with pCR status.

Dual input, 11-layer CNNs were developed using the Keras framework with a TensorFlow backend. After 4 strided convolutional layers, feature vectors were concatenated and fed through 3 fully connected layers. Between layers, batch normalization and dropout were included to prevent overfitting^8,9. Three-fold cross validation was used to account for outlying accuracies¹⁰. All aLNs from a particular patient were placed into the same fold to prevent feature sharing. Hyperparameters such as batch size, learning rate, and optimizers were tuned.

Accuracy and area under the curve (AUC) were used to evaluate model performance. During training, each node was treated as independent, and at testing, patient level decision was applied, similar to the rule applied in pathology. That is, a patient was classified with a pCR of 0 if at least one node predicted 0, and for an overall prediction of 1, all nodes must correctly predict a pCR of 1.

Results

Table 1 shows the results for accuracy and AUC of models trained using various aLN volume thresholds. Each model underwent significant training until convergence. With the highest metrics, a threshold of 0.92 cc was used to reduce noise in the data.

Table 3 displays that the model trained on the first and second timepoints performed best with an accuracy of 81.17% and AUC of 0.69, compared to the model using only the first timepoint (accuracy = 75.65%, AUC = 0.64) and the model using the first and last timepoints (accuracy = 80.77%, AUC = 0.70).

Discussion

Imaging of aLNs to assess response to NAC has not been well-studied, but the favorable performance of CNNs using aLN MRIs further supports the importance of aLN analysis for breast cancer treatment. Other studies have shown the ability to use CNNs to predict pCR from tumor images⁵, but this is the first to use aLN imaging alone.

The combination of TP1 and TP2 provide the most valuable information for the CNN prediction of pCR using aLNs. While previous studies have examined the predictive performance of TP1⁵ or TP4⁶ alone, our study reveals improved results using this combination.

An effective method of accurately determining patient response before surgery can substantially improve NAC treatment planning, reduce over-treatment, save time and cost, decrease risk of progression and metastasis⁷, and avoid invasive lymph node biopsies, emphasizing the importance of furthering this research.

Future studies can be done with larger independent datasets and fully automated aLN segmentations which are likely needed for routine clinical use.

Conclusion

CNN analysis of early changes in aLN characteristics from MRIs can accurately predict pCR to NAC. These results are encouraging and suggest CNNs have the potential to predict pCR in clinical settings. With further development, this approach has the potential to be a non-invasive alternative to aLN biopsy, guide treatment, and improve prognosis.

Acknowledgements

No acknowledgement found.

References

M. Ahmed & S. I. Usiskin & M. A. Hall-Craggs & Michael Douek. Is imaging the future of axillary staging in breast cancer? Eur Radiol (2014) 24:288–293
Ehteshami Bejnordi B, Mullooly M, Pfeiffer RM, Fan S, Vacek PM, Weaver DL, Herschorn S, Brinton LA, van Ginneken B, Karssemeijer N, Beck AH, Gierach GL, van der Laak J, Sherman ME. Using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies. Mod Pathol 2018;31(10):1502-1512. PMID 29899550
Arel, I., D.C. Rose, and T.P. Karnowski, Deep machine learning-a new frontier in artificial intelligence research. IEEE Computational Intelligence Magazine, 2010. 5(4): p. 13-18.
Newitt, D. and Hylton, N., “Multi-center breast DCE-MRI data and segmentations from patients in the I-SPY 1/ACRIN 6657 trials,” (2016). DOI: 10.7937/K9/TCIA.2016.HdHpgJLK.
Ravichandran, K., et al., A deep learning classifier for prediction of pathological complete response to neoadjuvant chemotherapy from baseline breast DCE-MRI. SPIE Medical Imaging. Vol. 10575. 2018: SPIE.
Ha R. et al. Axillary Lymph Node Evaluation Utilizing Convolutional Neural Networks Using MRI Dataset. J Digit Imaging. 2018 Apr 25. doi: 10.1007/s10278-018-0086-7.
Karagiannis, G. S., Pastoriza, J. M., Wang, Y., et al., “Neoadjuvant chemotherapy induces breast cancer metastasis through a TMEM-mediated mechanism,” Science Translational Medicine 9, eaan0026 (July 2017).
Ioffe, S. and C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. 2015, JMLR.org: Lille, France. p. 448-456.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R., “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research 15, 1929–1958 (2014).
Refaeilzadeh, P., L. Tang, and H. Liu, Cross-Validation, in Encyclopedia of Database Systems, L. Liu and M.T. ÖZsu, Editors. 2009, Springer US: Boston, MA. p. 532-538.

Figures

Table 1. Results from CNNs trained on thresholded aLNs from TP2 and TP1.

Table 2. The sample size of the data used for TP2 + TP1 before and after the threshold of 0.92 cc was applied.

Table 3. Results from CNNs trained using data from combinations of timepoints.

Figure 1. Examples of lymph nodes from a patient demonstrating complete response (row 1), incomplete response (row 2), and a patient with a small lymph node volume at TP1 and thresholded out (row 3) over various timepoints.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)

2316