0894

Global Planar Convolutions for improved context aggregation in Brain Tumor Segmentation with MR images

Santi Puch¹, Irina Sánchez¹, Aura Hernández², Gemma Piella³, Paulo Rodrigues¹, and Vesna Prc̆kovska¹

¹QMENTA Inc., Barcelona, Spain, ²Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain, ³SIMBIOsys, Universitat Pompeu Fabra, Barcelona, Spain

Synopsis

Brain tumors pose a significant social and economic burden worldwide. A key to improve the quality and expectancy of life of patients with brain tumors is to automate the process of delineation of tumoral structures. In this work we propose the Global Planar Convolution module, a building-block for Convolutional Neural Networks that enhances the context perception capabilities of segmentation networks for brain tumor segmentation. We show that such modules achieve similar performance to equivalent networks with increased depth, and provide an initial inspection of their behavior via interpretation of intermediate feature maps.

Purpose

Brain tumors pose a significant economic burden worldwide due to the costs of treatment and patient follow-up^1,2. Delineation and identification of the glioma structures in MRI can significantly improve treatment design and monitoring, thus increase patients’ life-expectancy and improve patients’ quality-of-life. Inter-rater variability of manual tumor segmentations is a major factor of inaccuracy in radiation therapy, creating the necessity for reproducible tools to automate this process³. As a consequence, a large variety of computational methods have been proposed to tackle this problem, with convolutional neural networks (CNN) showing promising results⁴. In this work, we introduce the Global Planar Convolution module (GPCm), a novel building-block that enhances context perception in segmentation networks, and we show that the inclusion of the GPCm reduces the network’s complexity while achieving similar performance to equivalent architectures.

Methods

The data was sourced from the BraTS 2018 dataset, consisting of 285 pre-operative scans with T1, T1-Gd, T2 and T2-FLAIR modalities and manual segmentation of the enhancing tumor, the peritumoral edema, and the necrotic and non-enhancing tumor⁵.

The proposed architecture, named ContextNet (fig. 1), is based on the popular network U-Net⁶ and includes residual units introduced in the work of He et al.⁷, which enable the creation of deeper networks and facilitate the possibility of passing low-level representations throughout the network. This architecture includes a novel building block, the GPCm, inspired by the work of Peng et al.⁸. The GPCm consists of the summation of three 2D convolutions performed in each of the orthogonal planes of the input volume (fig. 2). By constraining the convolution to a single plane, we can increase the kernel size, thus enlarge the effective receptive field of the network. This increased connectivity aids in the identification aspect of segmentation (what), while the residual elements and skip-connections contribute to the localization aspect of segmentation (where).

In our experiments, we used GPCm with kernels of size 15, chosen heuristically as the largest kernels that did not substantially compromise computational and memory requirements.

Results and Discussion

Figure 3 shows a comparison between a reference model without the GPCm, denoted as ResUNet, and variations of ContextNet with a different number of representation levels (i.e, a level consisting of operations that are performed at the same spatial resolution). Both ContextNet models with the reduced number of representation levels match or even surpass the performance of the ResUNet model when segmenting the whole tumor and the enhancing tumor. We hypothesize that the GPCm enables the aggregation of contextual information without the need of obtaining a deep representation via several pooling operations, which addresses the identification aspect of the segmentation task and reduces the complexity of the network. However, such complexity reduction results in more complex structures, such as the necrotic and non-enhancing tissue, to be more challenging for the segmentation due to the lack of model capacity.

Figure 4 depicts the feature maps extracted from the residual layers and GPC modules on the ContextNet models with reduced representation levels. It can be seen that, as we move from the pre-GPC residual layer to the GPC module and then the post-GPC residual layer, the features that the network extracts are increasingly abstract.

Predictions obtained from ResUNet, ContextNet constrained to three representation levels and full ContextNet are shown in fig. 5.

Conclusion

In this work, we introduced the GPCm in order to enhance the context perception capabilities of CNNs for the application of brain tumor segmentation. We investigated the behavior of the GPC modules by training networks with a limited number of representation levels and visualizing their intermediate representations. Finally, we showed that equivalent performance can be achieved using the GPCm even when the number of representation levels of the network is considerably reduced, as well as performance boosts when maintaining the complexity of the network.

The integration of ContextNet and related brain tumor segmentation models into the clinical workflow would reduce the time needed for treatment planning and monitoring, freeing up radiologists from tedious and error-prone tasks and allowing for more time to be invested into interpretation and diagnosis.

Future work includes uncertainty estimation via Monte-Carlo Dropout or related techniques, in-depth investigation of intermediate representations and use of other deep learning interpretability methods to better understand the behavior of the proposed GPCm.

Acknowledgements

No acknowledgement found.

References

1. Louis, D. N., Perry, A., Reifenberger, G., Von Deimling, A., Figarella-Branger, D., Cavenee, W. K., ... & Ellison, D. W. (2016). The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta neuropathologica, 131(6), 803-820.

2. Ostrom, Q. T., Bauchet, L., Davis, F. G., Deltour, I., Fisher, J. L., Langer, C. E., ... & Wrensch, M. R. (2014). The epidemiology of glioma in adults: a “state of the science” review. Neuro-oncology, 16(7), 896-913.

3. Weiss, Elisabeth & F Hess, Clemens (2003). The Impact of Gross Tumor Volume (GTV) and Clinical Target Volume (CTV) Definition on the Total Accuracy in Radiotherapy. Strahlentherapie und Onkologie : Organ der Deutschen Röntgengesellschaft ... [et al], 179, 21-30.

4. Crimi, A., Bakas, S., Kuijf, H., Menze, B., & Reyes, M. (Eds.). (2018). Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: Third International Workshop, BrainLes 2017, Held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, September 14, 2017, Revised Selected Papers (Vol. 10670). Springer.

5. Menze, B. H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., ... & Lanczi, L. (2015). The multimodal brain tumor image segmentation benchmark (BRATS). IEEE transactions on medical imaging, 34(10), 1993.

6. Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.

7. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

8. Peng, C., Zhang, X., Yu, G., Luo, G., & Sun, J. (2017, July). Large kernel matters—improve semantic segmentation by global convolutional network. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on (pp. 1743-1751). IEEE.

Figures

Fig. 1: ContextNet architecture overview. Tensor dimensions are specified for a single example during the training phase.

Fig. 2: Global Planar Convolution module (GPCm). The kernels are 3-dimensional but one of the dimensions is set to 1, which effectively translates to 2D kernels being convolved in each of the 3 orthogonal planes of the input volume.

Fig. 3: DICE scores (avg ± std) of ResUNet (i.e, ContextNet without GPC modules) and variations of ContextNet with a different number of representation levels (RL). Scores are computed on the test set.

Fig. 4: Feature map visualization for GPCm interpretability. The top figure shows the activations of ContextNet with 3 representation levels, while the bottom figure shows the activations of ContextNet with 2 representation levels.

Fig. 5: From top to bottom, left to right: FLAIR and T1-Gd MR modalities, ground-truth labels and segmentations produced by ResUNet, ContextNet trained with 3 representation levels and ContextNet with all (4) representation levels of 2 subjects from the test set.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

0894