0421

Computer-aided detection and segmentation of brain metastases in MRI for stereotactic radiosurgery via a deep learning ensemble

Zijian Zhou¹, Jeremiah W. Sanders¹, Jason M. Johnson², Tina M. Briere³, Mark D. Pagel⁴, Jing Li⁵, and Jingfei Ma¹
¹Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, ²Diagnostic Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, ³Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, ⁴Cancer Systems Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX, United States, ⁵Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

Synopsis

Manual delineation of brain metastases for stereotactic radiosurgery (SRS) is time consuming and labor intensive. We successfully constructed a deep learning ensemble, including a single shot detector and U-Net, to detect and subsequently segment brain metastases in MRI for SRS treatment planning. Postcontrast 3D T1-weighted gradient echo MR images from 266 patients were randomly split by 212:54 for model training-validation and testing. For the testing group, an overall sensitivity of 80.4% (189/235 metastases) with 4 false positives per patient, and a median segmentation Dice of 77.9% (61.4% - 86.3%) for the detected metastases were achieved.

Introduction

Manual identification and contouring of brain metastases in MRI, which are currently required for stereotactic radiosurgery (SRS) treatment planning, is time consuming and labor intensive. Several deep learning approaches based solely on semantic segmentation using fully convolutional networks were proposed for automated detection and segmentation of brain metastases. However, these approaches can lead to substantial false positives (up to 200 per patient), ¹ and the segmentation performance was low (around 67% Dice).² Herein, we constructed a deep learning ensemble for brain metastases segmentation that consists of two stages: (1) detection of the metastases using a single shot detector (SSD) ³, and (2) segmentation of the detected metastases using a U-Net ⁴ (Figure 1). With this approach, we hypothesized that false positives can be minimized while high detection sensitivity and subsequent segmentation of the metastases can be achieved.

Methods

Postcontrast 3D T1-weighted gradient echo MRI from 266 patients undergoing SRS for brain metastases at our institution between January 2011 and August 2018 were retrospectively analyzed. Typical scan parameters were: TR/TE = 6.9/2.5 ms, NEX = 1.7, flip angle = 12°, matrix size = 256 × 256, FOV = 24 cm, voxel size = 0.94 × 0.94 × 1.00 mm. These images were randomly split into 80% (212/266) training and 20% (54/266) testing groups. For the training group, sub-sample split (20% of patient MR images) were used for model validation. Manual identification and contouring of the brain metastases by neuroradiologists and treating radiation oncologists, respectively, were considered as the ground truth. The metastasis size, which was measured as the largest cross-sectional dimension when projected onto a 2D plane in the craniocaudal direction, ranged from 2 to 52 mm.

The detection and segmentation networks were separately constructed and trained on the same training group. An SSD was constructed for the detection stage of the ensemble (Figure 2). Inputs to the SSD were the axial MRI slices, and the outputs were the prediction bounding boxes encompassing brain metastases and the associated detection confidences. The SSD consisted of 16 convolutional layers for feature extraction, among which six layers were used for metastasis detection. The six detection layers had feature maps with matrix sizes of 128 × 128, 64 × 64, 32 × 32, 16 × 16, 8 × 8, and 4 × 4. The SSD loss, which is a weighted sum of the classification and localization losses, was used for the bounding box regression.

For segmentation, a U-Net was constructed using a VGG16 backbone as the convolutional encoder (Figure 3). Inputs to the U-Net were 64 × 64 slice crops covering entire cross-sections of brain metastases, with the centers of the crop and cross-section aligned, and the outputs were the segmentation masks. Matrix sizes of the feature maps were 64 × 64, 32 × 32, 16 × 16, 8 × 8, and 4 × 4. Cross-entropy loss was used for background and metastasis pixel-wise classification. For both SSD and U-Net, an Adam optimizer with an initial learning rate of 0.0002 was used to train the networks. Random affine transformation augmentations were applied during the training of both networks.

Potential regions of interest detected by the SSD with confidence ≥ 50% were cropped based on the predicted bounding box coordinates, and were input to the U-Net for segmentation. The final performance of the deep learning ensemble was evaluated across the entire brain volume. The adjacent output segmentation masks were first stacked to form a segmentation volume, which was then compared with the ground truth metastasis volumes. The ground truth volumes were considered true positives (TPs) if they had at least one voxel segmented; otherwise they were false negatives (FNs). The segmentation volumes were considered false positives (FPs) if they had no voxel overlap with any of the ground truth volumes. Free-response receiver operating characteristics (FROC) and boxplot were used to demonstrate the detection and segmentation performance of the ensemble (Figure 4).

Results

At the 50% confidence level, our method achieved an overall detection sensitivity of 80.4% (180/235) with approximately 4 FPs per patient, and a median segmentation Dice of 77.9% (61.4% - 86.3%) for the detected metastases. For metastases ≥ 6 mm, the sensitivity was 92.9% (131/141) and the median Dice was 81.9% (72.1% - 89.4%). The combined detection and segmentation for one patient took < 10 s on an NVIDIA DGX-1 workstation. Detection and segmentation examples from patients with metastases of different numbers, sizes, types and locations are shown in Figure 5.

Discussion/conclusion

Our deep learning approach showed promising results to assist SRS treatment planning for brain metastases. Using a single MRI acquisition improves its feasibility for clinical applications. By lowering the detection confidence threshold, more metastases could be detected at a cost of precision. For segmentation, a large variation in the Dice coefficient due to different reasons was observed. For the smaller metastases, the poorly defined boundaries and lower contrast typically produced pixel-wise FPs, while for metastases with large necrotic regions, the low signal areas typically led to pixel-wise FNs, both of which reduced the Dice coefficient. Future work will involve improving the ensemble performance through ablation studies, hyperparameter searches, and inclusion of additional curated patient data.

Acknowledgements

No acknowledgement found.

References

[1] Charron O, Lallement A, Jarnet D, et al. Automatic detection and segmentation of brain metastases on multimodal MR images with a deep convolutional neural network. Comput Biol Med. 2018;95:43-54.

[2] Liu Y, Stojadinovic S, Hrycushko B, et al. A deep convolutional neural network-based automatic delineation strategy for multiple brain metastases stereotactic radiosurgery. PLoS One 12(10):e0185844.

[3] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. In: Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, vol 9905. Springer, Cham.

[4] Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham.

Figures

Illustration of the developed deep learning ensemble for brain metastasis detection and segmentation. Inputs were axial slices of the postcontrast T1-weighted MR images, and the intermediate outputs were the prediction bounding box coordinates and the associated classification confidence. Using the predicted coordinates, regions of 64 × 64 were cropped around the center of the bounding box and were input to the U-Net for segmentation. The final output is the segmentation mask of the entire image, after mapping the crop segmentation masks back to the global imaging volume.

Illustration of the constructed single shot detector (SSD). Predictions were made at six resolution scales. L2 normalization was used for the first two detection layers to account for the different scales of the feature maps. The classification and bounding box position regression losses were concatenated to form the SSD loss. Numbers above and under each layer are the number of feature maps, and convolution kernel sizes and feature map matrix sizes, respectively.

Illustration of the constructed U-Net. A VGG16 backbone (stacked convolutions) was used as the convolutional encoder. Addition of the corresponding encoder and decoder layers was used as the skip connection. Softmax activation was used for the last layer for cross-entropy loss computation. Numbers above and under each layer are the number of feature maps and feature map matrix sizes. All convolution kernels had a size of 3 × 3.

(A) Brain metastases size distribution of the testing group. (B) Detection sensitivity of our deep learning method at a confidence level of 50%. Most of the metastases ≥ 6 mm, and over half of the metastases < 6 mm were detected. (C) Boxplot of the segmentation Dice for the detected metastases; the mean values are shown as ×’s. With a median Dice of 77.9% (61.4% - 86.3%), most of the outliers with lower Dice coefficients were metastases with large necrotic regions. (D) Detection free-response receiver operating characteristics. More metastases were detected at the cost of precision.

Brain metastases detection and segmentation examples for 3 patients. Left column: input images; center column: predicted metastasis locations; right column: intersections of the ground truth and predicted segmentation masks (shown in yellow). These figures show that metastases of different sizes, locations, and types can be detected in a single forward pass of the images through the ensemble of networks. The segmentation accuracy was generally high for hemorrhagic metastases (1c, 2c); in contrast, it can be poor for metastases with relatively large necrotic regions (3c).

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)

0421