4879

Detection of White Matter Hyperintensities using Ensemble 3D Deep Learning Networks
Lavanya Umapathy1, Gloria Guzman2, Jose Rosado-Toro2, Gokhan Kuyumcu2, Maria Altbach2, Blair Winegar2, Craig Weinkauf3, and Ali Bilgin1,4

1Electrical and Computer Engineering, University of Arizona, Tucson, AZ, United States, 2Department of Medical Imaging, University of Arizona, Tucson, AZ, United States, 3Department of Surgery, University of Arizona, Tucson, AZ, United States, 4Biomedical Engineering, University of Arizona, Tucson, AZ, United States

Synopsis

White matter hyperintensities (WMH), hyperintense on T2-weighted FLAIR images are prominent features of demyelination and axonal degeneration in cerebral white matter. The time-consuming nature of manual segmentation necessitates the need for faster and reliable automated segmentation algorithms. In this work, we propose three deep learning architectures for WMH detection on 3D FLAIR images: a modified UNET3D, Res-UNET3D and their ensemble combination. Two UNET3D and two Res-UNET3D were trained with random initialization using 3D patches sampled from within the brain. The posterior probabilities for WMH from individual networks were averaged to obtain a revised posterior probability for the ensemble. Performance of the individual networks as well as that of the ensemble was assessed using dice and precision scores.

It was observed that the ensemble of 3D networks yields improved dice and precision scores in comparison to an average of individual networks, thereby reducing the effect of choice of network or parameters. Furthermore, the average dice scores for the ensemble approached the inter-observer variability of human observers.

Introduction

Lesions in brain white matter which appear as hyperintense in T2-weighted Fluid Attenuated Inversion Recovery (FLAIR) images, White Matter Hyperintensities (WMH), are prominent features of demyelination and axonal degeneration observed within cerebral white matter or subcortical gray matter1. Clinically, the extent of WMH in the brain have been associated with cognitive impairment and increased risk of stroke or dementia2. Since manual segmentation of WMH is impractical, there is an increased interest in developing automated algorithms. Recently, 2D deep learning methods have been proposed for this task3,4 and detection challenges have been organized5.

UNET6 and Residual Net (Res-Net)7 Convolutional Neural Networks (CNNs) have been recently proposed for many medical image segmentation tasks. Earlier 2D CNNs used for WMH detection use only in-plane spatial context for decision making. The availability of 3D isotropic FLAIR images makes it possible to explore 3D CNNs with improved spatial context. In this work, we first propose two new 3D UNET architectures: A modified 3D UNET (UNET3D, Figure 1) and a 3D UNET with residual blocks (Res-UNET3D, Figure 2A) for WMH detection. Noting that different network architectures and training parameters yield different solutions, we also propose a 3D ensemble network architecture (Figure 2B) where posterior probabilities for WMH from individual networks are averaged to obtain a revised posterior probability. We illustrate that the 3D ensemble CNN yields state-of-the-art detection performance with high precision and average dice scores approaching that of inter-observer variability of human observers.

Methods

3D T2-FLAIR images (1 mm isotropic resolution) were acquired at 3T (Siemens Skyra) on 20 subjects. 16 subjects were selected for training, 2 for validation, and 2 for testing. The WMH were annotated by an experienced neuroradiologist. Pre-processing steps included brain extraction8, N4 bias correction9, and normalizing each subject’s data to have zero mean and unit standard deviation.To mitigate the problem of class imbalance, we used 3D patches (64x64x5), sampled from regions within the brain. Sampling was performed to ensure that an equal number of patches with and without lesions were used. Data augmentation was performed using a combination of horizontal/vertical flips and rotations, increasing the training size three-fold.Experiments were carried out in Keras10 with tensorflow background11. Two UNET3D and two Res-UNET3D networks were initialized with different weights. The UNET3Ds and Res-UNET3Ds were trained with weighted binary cross entropy and categorical cross entropy loss functions, respectively. Each network was trained for 30 epochs using mini batch size=10 and LR=0.01/0.0005 (UNET3D/Res-UNET3D). During testing, it was observed that the prediction accuracy varies with the location of the slice of interest within the 3D patch. Therefore, only the prediction for the center slice in a block was retained and a sliding window approach was used to obtain the predictions for all slices.

Results and Discussion

Figure 3 shows manual annotations, WMH predictions from individual 3D networks, along with the predictions of the 3D ensemble network, overlaid on FLAIR images for a test subject with relatively large lesions. Performance of the proposed approaches were evaluated quantitatively using dice and precision scores. The table in Figure 3D provides these scores for the first test subject. It is interesting to note that the performances of the proposed networks on this subject approach that of inter-observer variability of human observers which have been reported to be around a dice score of 0.8 in literature12. Similarly, Figure 4 provides the same information for a second test subject with fewer and smaller WMH. It is clear that the 3D ensemble network improves the dice scores and precision, thereby reducing the effect of choice of network or parameters. Note that in this more challenging case, the dice scores are lower, but the precision of the 3D ensemble network remains high.

Conclusion

Three 3D deep learning networks, consisting of UNET3D, Res-UNET3D, and their ensemble combination, were proposed for WMH detection on 3D FLAIR images. It was observed that the ensemble of 3D networks yields improved dice and precision scores in comparison to an average of individual networks. It was also observed that the performance of the 3D networks was better at the center slice in comparison to the edge slices of each patch, which suggests that the 3D spatial information improves detection performance. As shown in earlier 2D WMH segmentation publications, using information from MPRAGE datasets in addition to the FLAIR images can improve detection performance. We will consider a 3D implementation of such an approach in the future.

Acknowledgements

This work was supported by the Arizona Health Sciences Center Translational Imaging Program Project Stimulus (TIPPS) Fund. The authors would also like to acknowledge support from the technology and Research Initiative Fund (TRIF) Improving Health Initiative.

References

1. Maniega SM, Valdes Hernandez MC, et al. White matter hyperintensities and normal-appearing white matter integrity in the aging brain. Neurobiology of Aging. 2015; 36(2)

2. Brickman AM, et al. Testing the white matter retrogenesis hypothesis of cognitive aging. Neurobiology of Aging. 2012; 33(8)

3. Guerroro R, et al. White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks. NeuroImage: Clinical. 2018; 17

4. Li et al. Fully convolutional network ensembles for white matter hyperintensities segmentation in MR images. NeuroImage. 2018; 183

5. Grand Challenge at MICCAI 2017:WMH Segmentation Challenge. http://wmh.isi.uu.nl/ WMH Segmentation Challenge

6. Ronneberger O, et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI, Springer, LNCS. 2015;9351

7. He K, Zhang S, Ren S, Sun J. Deep Residual Learning for Image Recognition. 2015. arXiv:1512.03385v1

8. Smith SM. Fast robust automated brain extraction. Human Brain Mapping. 2002;17(3)

9. Tustison N, et al. N4ITK: Improved N3 Bias Correction. IEEE Trans Med Imaging. 2010

10. Chollet, Francois and others. 2015. https://keras.io

11. Abadi M, et al. TensorFlow: Large-scale machine learning on heterogeneous systems. 2015. Software available from tensorflow.org

12. Ghafoorian M, et al. Location Sensitive Deep Convolutional Neural Networks for Segmentation of White Matter Hyperintensities. Nature Scientific Reports. 2017

13. Kamnitsas K, et al. Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation. Internation MICCAI Brainlesion Workshop. arXiv:1711.01468v1

Figures

Figure 1: An illustration of the modified 3D UNET architecture, with convolutional blocks, used in this work. There are four levels of resolution with Batch Normalization (B) and ReLU activation (R) after every convolution, with feature concatenation in the synthesis path. In training mode, the input to the architecture is a 3D patch of size 64x64x5. The convolutions are zero padded to ensure prediction size matches with input. A weighted binary cross entropy loss function is used where the weights are calculated from the ratio of pixels with and without lesions in the training data.

Figure 2: A) An illustration of the Res-UNET3D, a multi-scale network architecture similar to UNET but consisting of residual blocks instead of convolutional blocks, as shown above. The synthesis path consists of feature addition instead of concatenation. B) An illustration of the 3D ensemble setup. This setup contains two 3DUNETs (Figure 1) and two Res-UNETs (Figure 2A) trained with random initializations . The posterior probabilities for WMH from individual networks are averaged to obtain a revised posterior probability for the ensemble.

Figure 3: The manual annotations (A), WMH predictions from the 3D networks (B), and the prediction from the 3D ensemble network (C) are shown overlaid on a FLAIR image for a sample slice for a test subject with large lesions. The table (D) shows the dice scores and precision metric calculated on the full 3D volume. It can be seen that there is variation in the performances of the individual 3D networks and using the ensemble network to combine the predictions improves the dice and precision scores.

Figure 4: The manual annotations (A), WMH predictions from the 3D networks (B), and the prediction from the 3D ensemble network (C) are shown overlaid on a FLAIR image for a sample slice for a test subject with fewer and smaller lesions. The table (D) shows the dice scores and precision metric calculated on the full 3D volume. Similar to Figure 3, there is an increase in dice and precision scores for the ensemble network when compared to individual 3D networks or their average.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)
4879