0020

Fully-Automated Glioma Volumetric Segmentation and Treatment Response Assessment in MRI using Deep Learning

Ken Chang¹, Andrew L Beers¹, Harrison Bai², James Brown¹, K Ina Ly³, Xuejun Li⁴, Joeky Senders⁵, Vasileios Kavouridis⁵, Alessandro Boaro⁵, Chang Su⁶, Ena Agbodza², Wenya Linda Bi⁵, Otto Rapalino³, Weihua Liao⁴, Qin Shen⁷, Hao Zhou⁴, Bo Xiao⁴, Yinyan Wang⁸, Paul Zhang², Marco Pinho¹, Patrick Wen⁹, Tracy Batchelor³, Omar Arnaout⁵, Bruce Rosen¹, Elizabeth Gerstner³, Li Yang⁷, Raymond Huang⁵, and Jayashree Kalpathy-Cramer³

¹Radiology, Massachusetts General Hospital, Boston, MA, United States, ²Hospital of the University of Pennsylvania, Philadelphia, PA, United States, ³Massachusetts General Hospital, Boston, MA, United States, ⁴Xiangya Hospital, Changsha, China, ⁵Brigham and Women’s Hospital, Boston, MA, United States, ⁶Yale School of Medicine, New Haven, CT, United States, ⁷The Second Xiangya Hospital, Changsha, China, ⁸Beijing Tiantan Hospital, Beijing, China, ⁹Dana-Farber Cancer Institute, Boston, MA, United States

Synopsis

Longitudinal assessment of glioma burden is important for evaluating treatment response and tumor progression. Delineation of tumor regions is typically performed manually but is time-consuming and subject to inter-rater and intra-rater variability. Therefore, there has been interest in developing automated approaches to calculate 1) glioma volume and 2) the product of maximum diameters of contrast-enhancing tumor (the key measure used in the Response Assessment for Neuro-Oncology (RANO) criteria). We present a fully automated pipeline for brain extraction, tumor segmentation, and RANO measurement (AutoRANO). We show the utility of this pipeline on 713 MRI scans from 54 post-operative glioblastoma patients, demonstrating capacity for tumor burden measurement.

Introduction

Gliomas are infiltrative neoplasms of the central nervous system that affect patients of all ages, with variable growth rates and prognoses.^1,2 Serial assessment of tumor burden has been shown to be important for the prediction of survival outcomes and evaluation of treatment effectiveness in gliomas.^3,4 Current clinical guidelines (Response Assessment in Neuro-Oncology, RANO) are based on calculating the product of maximum diameters of contrast-enhancing tumor as a measure of assessment of treatment response.⁵ Manual delineation of tumor boundaries can be difficult if the tumor is diffuse or demonstrates poor or heterogeneous contrast enhancement. Furthermore, manual segmentation is labor-intensive and subject to inter-rater variability, resulting in low reproducibility.^6,7 As such, there has been great interest in developing automated approaches for 1) calculation of volume and 2) the product of maximum diameters (the primary metric used in RANO criteria). With the advent of more powerful graphics processing units, deep learning has become the method of choice for automatic segmentation in medical images.^8,9 In this study, we present a fully-automated pipeline for brain extraction and tumor segmentation that can be used to reliably extract FLAIR tumor volumes, contrast-enhancing tumor volumes, and RANO criteria from post-operative glioblastoma patient data from two clinical trials.

Methods

Following IRB approval, imaging data was acquired from two clinical trials which enrolled patients with newly diagnosed GBM. Our final post-operative patient cohort consisted of 713 MRI scans from 54 patients. Expert segmentations for whole brain regions, FLAIR hyperintensities, and T1 contrast-enhancement were acquired from an expert neuro-radiologist or neuro-oncologist. Additionally, RANO measurements were acquired from two expert neuro-oncologists. We utilized the 3D U-Net architecture - a neural network designed for fast and precise segmentation - for both brain extraction and tumor segmentation (Fig.1).^10,11 The code for pre-processing and U-Net architecture is publicly available: https://github.com/QTIM-Lab/DeepNeuro.¹² We further developed an automated RANO (AutoRANO) algorithm to automatically derive RANO measurements from automatic contrast-enhancing tumor segmentations.¹³ The patients from the patient cohort were randomly divided into training and testing sets in a 4:1 ratio. We compared the baseline visits and the last patient visits by subtracting the RANO measures (delta RANO).

Results

For the testing set, the mean Dice coefficient between our algorithm and expert manual FLAIR tumor segmentation was 0.701 (95% CI 0.67-0.731) (Fig. 2). The mean Dice coefficient between our algorithm and expert manual contrast-enhancing tumor segmentation was 0.696 (95% CI 0.66-0.728). When comparing the agreement of calculated FLAIR volumes between automatic and manual segmentation, the Spearman rank correlation coefficient was 0.948 for the testing set. For contrast-enhancing tumor volumes, the Spearman rank correlation coefficient was 0.933 for the testing set (Fig. 3).

We assessed reproducibility of manual and automatic measurements by comparing measurements from the two baseline visits (acquired prior to treatment initiation) for each patient. Comparing baseline visits 1 and 2 for RANO measurements, the intraclass correlation coefficient (ICC) for Rater 1 was 0.962, the ICC for Rater 2 was 0.992, and the ICC for Auto RANO was 0.977 (Fig. 4-5).

To assess the capability of RANO measures to assess changes in tumor burden during treatment, we compared the delta RANO across different raters. The ICC for delta RANO for Rater 1 and Rater 2, Auto RANO and Rater 1, and AutoRANO and Rater 2 was 0.877, 0.850, and 0.878, respectively.

Discussion

In this study, we demonstrate the utility of a fully automated, deep-learning based pipeline for calculation of tumor volumes as part of a larger effort to apply deep learning techniques to the field of neuro-oncology. This is the first application, to our knowledge, of deep learning for post-operative glioblastoma segmentation in comparison to previous studies which focused on pre-operative glioblastoma segmentation.¹⁴ In comparing manual and automatic segmentation methods, we observed high agreement between manual and automatic volumes. The AutoRANO algorithm also had high reproducibility. We also observed high agreement between AutoRANO and expert raters, as reflected by the high ICC for delta RANO. This demonstrates the utility of automated methods to assessing changes in tumor burden.

Conclusion

We present an open-source, fully-automatic pipeline for brain extraction, segmentation, and RANO measurements applied to a large, multi-institutional pre-operative glioma patient cohort and a post-operative glioblastoma patient cohort. This tool may be helpful in clinical trials as well as clinical practice in expediting measurement of tumor burden in the evaluation of treatment response. Furthermore, it serves as an important proof-of-concept for automated tools in the clinic and may be applicable to other tumor pathologies.

Acknowledgements

This project was supported by a training grant from the NIH Blueprint for Neuroscience Research (T90DA022759/R90DA023427) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under award number 5T32EB1680 to K. Chang. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This study was supported by National Institutes of Health grants U01 CA154601, U24 CA180927, and U24 CA180918 to J. Kalpathy-Cramer. This work was supported by the National Natural Science Foundation of China (81301988 to L. Yang, 81472594/81770781 to X. Li., 81671676 to W. Liao), and Shenghua Yuying Project of Central South University to L. Yang. We would like to acknowledge the GPU computing resources provided by the MGH and BWH Center for Clinical Data Science. This research was carried out in whole or in part at the Athinoula A. Martinos Center for Biomedical Imaging at the Massachusetts General Hospital, using resources provided by the Center for Functional Neuroimaging Technologies, P41EB015896, a P41 Biotechnology Resource Grant supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB), National Institutes of Health.

References

1. De Robles, P. et al. The worldwide incidence and prevalence of primary brain tumors: A systematic review and meta-analysis. Neuro-Oncology 17, 776–783 (2015).

2. Thakkar, J. P. et al. Epidemiologic and molecular prognostic review of glioblastoma. Cancer Epidemiology Biomarkers and Prevention 23, 1985–1996 (2014).

3. Brasil Caseiras, G. et al. Low-grade gliomas: six-month tumor growth predicts patient outcome better than admission tumor volume, relative cerebral blood volume, and apparent diffusion coefficient. Radiology 253, 505–512 (2009).

4. Iliadis, G. et al. Volumetric and MGMT parameters in glioblastoma patients: Survival analysis. BMC Cancer 12, 3 (2012).

5. Wen, P. Y. et al. Updated response assessment criteria for high-grade gliomas: response assessment in neuro-oncology working group. J. Clin. Oncol. 28, 1963–72 (2010).

6. Deeley, M. A. et al. Comparison of manual and automatic segmentation methods for brain structures in the presence of space-occupying lesions: a multi-expert study. Phys. Med. Biol. 56, 4557–4577 (2011).

7. Huang, R. Y. et al. The Impact of T2/FLAIR Evaluation per RANO Criteria on Response Assessment of Recurrent Glioblastoma Patients Treated with Bevacizumab. Clin. Cancer Res. 22, 575–581 (2016).

8. Kamnitsas, K. et al. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017).

9. Havaei, M. et al. Brain tumor segmentation with Deep Neural Networks. Med. Image Anal. 35, 18–31 (2017).

10. Beers, A. et al. Sequential neural networks for biologically-informed glioma segmentation. in Medical Imaging 2018: Image Processing (eds. Angelini, E. D. & Landman, B. A.) 10574, 108 (SPIE, 2018).

11. Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O. 3D U-net: Learning dense volumetric segmentation from sparse annotation. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9901 LNCS, 424–432 (2016).

12. Beers, A. et al. DeepNeuro: an open-source deep learning toolbox for neuroimaging. (2018).

13. Ellingson, B. M., Wen, P. Y. & Cloughesy, T. F. Modified Criteria for Radiographic Response Assessment in Glioblastoma Clinical Trials. Neurotherapeutics 14, 307–320 (2017).

14. Menze, B. H. et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015).

Figures

Figure 1. (A) Image pre-processing steps in our proposed approach. (B) A U-Net architecture was used for brain extraction and tumor segmentation. The output is a binary label map.

Figure 2. (A) Example of manual vs automatic FLAIR tumor segmentation (A-B) and enhancing-tumor segmentation (C-D) for the testing set in the post-operative patient cohort.

Figure 3. Measures derived from automatic vs manual segmentation in Training Set for (A) whole tumor volume, (B) contrast-enhancing tumor volume, in Testing set for (C) whole tumor volume, (D) contrast-enhancing tumor volume for the post-operative patient cohort.

Figure 4. Examples of Auto RANO applied to automatic enhancing segmentations on the post-operative patient cohort

Figure 5. Measures derived for baseline Visits 1 vs baseline visit 2 for (A) Rater 1 RANO, (B) Rater 2 RANO, (C) Auto RANO in the post-operative patient cohort. Agreement between Delta RANO measures, the difference between baseline and last patient visits, for (D) Rater 1 vs Rater 2, (E) Auto RANO vs Rater 1, and (F) Auto RANO vs Rater 2 in the post-operative patient cohort.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

0020