2441

Microstructural White Matter Segmentation in Mild Traumatic Brain Injury Patients using DTI and a Deep 2D-UNet Ensemble
Brian McCrindle1,2, Nicholas Simard1,2, Ethan Samson2,3, Ethan Danielli 2,3, Thomas E. Doyle1,3,4, and Michael D. Noseworthy1,2,3
1Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada, 2Imaging Research Center, St. Joseph's Healthcare, Hamilton, ON, Canada, 3School of Biomedical Engineering, McMaster University, Hamilton, ON, Canada, 4Vector Institute, Toronto, ON, Canada

Synopsis

Patients who experience a mild traumatic brain injury often suffer from microstructural white matter damage that even radiologists are unable to detect. By employing diffusion tensor imaging and a deep 2D-UNet ensemble network, we developed an image processing pipeline capable of detecting and segmenting damaged white matter regions. We show that ensemble networks are more reliable compared to any single model over the prediction threshold range under test-time-augmentation.

Introduction

In North America alone, over 1.7 million people are affected by mild traumatic brain injury (mTBI) each year1. Typically, victims are left with a vague diagnosis of their condition since there is no quantifiable way to understand the patient’s injury clinically. Therefore, a sophisticated, automated technique is required to assist physicians in the diagnostic process. In this study, we developed a deep learning system capable of detecting and segmenting microstructural white matter (WM) damage within an adult brain that has experienced an mTBI. Here, we explore the implementation of multiple 2D-UNets, a framework designed specifically for biomedical segmentation tasks, and ensemble the results for improved predictive performance.

Methods

Ten mTBI patients (6 Male/4 Female, ages 22 to 66) having experienced an mTBI within 2 years of the study start date were recruited. Healthy control subjects (n = 88 per patient age, totaling 880 normal scans) were sourced from publicly open data repositories2,3. A GE MR750 Discovery 3T MRI scanner and 32-channel RF receiver coil were used for scanning mTBI patients. Axial DTI was acquired using a dual-echo EPI sequence (TE/TR=87/8800ms, 122x122 matrix, 2mm thickness, FOV=244mm). All brains were registered to the Juelich Histological atlas4 for 3D probabilistic mapping of WM regions. A deep 2D-UNet5,6 ensemble framework with various encoding backbones, random weight initialization, dropout, and a weighted binary cross-entropy loss function was implemented with PyTorch 1.5.0 and CUDA 10.1. A training/validation/test split of 60/15/25% was used to evaluate the performance of the system using only Fractional Anisotropy (FA) maps. Patient images were not shared between the training and testing sets to ensure proper testing performance. Voxel-wise labeling of the various damaged regions-of-interest were determined through age-relevant population-wise Z-scoring and Tract-Based Spatial Statistics7.

All FSL8 preprocessed DICOM images were converted to NifTi using the dcm2niix toolkit9. Each axial slice of the NifTi volume was then subsequentially extracted, converted into tiff for training, and subjected to a set of random affine transformations during each training epoch. Model predictions were ensembled using the method described in Lakshminarayanan et al.10 to improve generalizability and report voxel-wise predictive uncertainty estimates. Performance curves were developed with and without test-time augmentation (TTA).

Results

With the considerations noted above, all models should concentrate to slightly different locations within the loss landscape. Therefore, the optimal classification threshold for each model varies due to stochastic gradient descent optimization. The performance of all models with various thresholds is illustrated in the precision-recall and Dice scoring curves seen in Fig. 1 and Fig. 2, respectively. The ensemble is the most stable over the entire threshold range even if it is not the “optimal” model for a particular threshold condition. Performance drops during TTA as this is a form of “out-of-distribution” (OOD) testing. In Fig. 3, each model predicts nuances within the image, leading to the best predictive result through the ensemble. The corresponding variance communicates the model’s uncertainty in the center of the segmentation.

Discussion

The ideal threshold for the ensemble is difficult to determine since this value directly affects the precision and recall of the model. When simply choosing a threshold that maximizes Dice score, individual model Dice scores average around 0.61 and 0.50 for unaltered and TTA cases, respectively. These metrics are expected to improve as the patient sample size increases, as this is currently the major bottleneck on predictive performance. Finally, exposing models to adversarial examples during the training process is expected to improve overall performance in the TTA / OOD case10.

Conclusion

Even though the ensemble’s performance is limited due to the lack of available data, this work shows the possibility of using an ensemble of deep 2D-UNet architectures to segment damaged white matter regions within the brain. A platform such as this can provide clinicians with better healthcare solutions for their patients experiencing mTBI. With increasing amounts of data, deep learning-based algorithms are expected to learn nuances within data that traditional statistical approaches cannot.

Acknowledgements

We would like to thank NSERC for their financial support.

References

1. M. Faul, L. Xu, M. Wald, Traumatic Brain Injury in the United States: Emergency Department Visits, Hospitalizations, and Deaths 2002-2006. U.S Centers for Disease Control (2010).

2. Laboratory of Neuroimaging. (2011). International Consortium for Brain Mapping. Retrieved from https://ida.loni.usc.edu/

3. D.C. Van Essen, S.M. Smith, D.M. Barch, T.E.J. Behrens, E. Yacoub, K. Ugurbil, The WU-Minn Human Connectome Project: An overview, Neuroimage. 80 (2013) 62–79. https://doi.org/10.1016/j.neuroimage.2013.05.041.

4. K. Amunts, H. Mohlberg, S. Bludau, K. Zilles, Julich-Brain: A 3D Probabilistic Atlas of the Human Brain’s Cytoarchitecture. Science. Vol. 369, Issue 6506, pp. 988-992. https://doi.org/10.1126/science.abb4588.

5. O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 9351 (2015) 234–241. https://doi.org/10.1007/978-3-319-24574-4_28.

6. P. Yakubovskiy, Segmentation Models Pytorch. Github, 2020. https://github.com/qubvel/segmentation_models.pytorch

7. S.M. Smith, M. Jenkinson, H. Johansen-Berg, D. Rueckert, T.E. Nichols, C.E. Mackay, K.E. Watkins, O. Ciccarelli, M.Z. Cader, P.M. Matthews, and T.E.J. Behrens. Tract-based spatial statistics: Voxelwise analysis of multi-subject diffusion data. NeuroImage, 31:1487-1505, 2006.

8. S.M. Smith, M. Jenkinson, M.W. Woolrich, C.F. Beckmann, T.E.J. Behrens, H. Johansen-Berg, P.R. Bannister, M. De Luca, I. Drobnjak, D.E. Flitney, R. Niazy, J. Saunders, J. Vickers, Y. Zhang, N. De Stefano, J.M. Brady, and P.M. Matthews. Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage, 23(S1):208-19, 2004

9. Li X, Morgan PS, Ashburner J, Smith J, Rorden C. The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J Neurosci Methods. 2016 May 1;264:47-56. doi: 10.1016/j.jneumeth.2016.03.001. Epub 2016 Mar 2. PMID: 26945974.

10. B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, Neural Inf. Process. Syst. (NIPS 2017). (2017). https://doi.org/10.1007/BF00378152.

Figures

Precision-Recall curve of three 2D-UNet architectures with residual network (resnet101), vgg19, and inceptionv4 backbones with and without test-time augmentation (TTA). Ensemble (pink) is shown to perform most consistently compared to all models over the classification threshold range

Dice-Score vs Threshold for all models with and without TTA. The ensemble performs most reliably over the threshold range.

Example of an unseen slice fed into the ensemble model. (a) Normalized Axial FA Slice. (b) Z-Scoring label. (c – e) Model predictions for 2D-UNets with Resnet101, Vgg19, InceptionV4 Encoders with Dice scores 0.69, 0.69, 0.70 respectively. (f) Ensemble Predictive Uncertainty. (g) Ensemble Prediction. (h) Ensemble Prediction with “Optimal” Threshold and a Dice Score of 0.71.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)
2441