3492

Improved Automated Hippocampus Segmentation using Deep Neural Networks

Maximilian Sackl¹, Alina Dima², Christian Payer², Darko Štern³, Reinhold Schmidt¹, and Stefan Ropele¹
¹Department of Neurology, Medical University of Graz, Graz, Austria, ²Institute of Computer Graphics and Vision, Graz University of Technology, Graz, Austria, ³Department of Biophysics, Medical University of Graz, Graz, Austria

Synopsis

Segmentation of the hippocampal formation on T1-weighted structural MR scans is a prerequisite for most imaging studies in Alzheimer’s disease. In this work, we evaluated the performance and accuracy of deep learning-based hippocampus segmentation combined with manual ground truth (GT) data that originates from high-resolution T2-weighted MR images. Results were evaluated against the GT-labels and compared to segmentation results obtained with FreeSurfer. All learning approaches outperformed FreeSurfer in terms of accuracy and speed, where experiments utilizing the T2-based GT-labels yielded the best results. Thus, using T2-weighted images for training a deep learning model can improve automated HC segmentation.

Introduction

Hippocampal atrophy (volume loss) has become a secondary outcome parameter in recent clinical trials on Alzheimer’s disease (AD). For an accurate estimation of the hippocampal volume and its progression, a robust and reliable hippocampus (HC) segmentation is required. Manual segmentation is considered the gold standard but is often replaced by automated software packages, such as FreeSurfer (FS). More recently, deep learning (DL), as a powerful medical imaging tool to learn complex models, has been proposed for HC segmentation^1,2. Commonly, HC segmentation is done on T1-weighted (T1) MRI scans with 1mm³ isotropic resolution. However, distinct identification of the HC border on such T1 images is limited, especially near cerebrospinal fluid. A higher resolution, crucial for the ground truth (GT) generation, is not feasible because of scan time constraints and the higher susceptibility for motion-induced artifacts. We propose the acquisition of high-resolution T2-weighted (T2) scans to train our DL approach to overcome this limitation and to generate a more accurate GT.
In this work, we investigated the performance and accuracy of deep learning-based HC segmentation, which relies on GT data from very high-resolution T2 scans, as depicted in Figure 1.

Methods

Dataset.
A unique dataset was acquired to create highly accurate GT segmentations for training a neural network and establishing reference data to evaluate the performance of FreeSurfer. Twenty-three healthy volunteers were scanned at a Siemens Prisma 3T MR scanner using a 20-channel head coil to obtain corresponding pairs of high-resolution T1 and T2 images (see Figure 2).
Whole-brain T1 images were acquired with a 3D magnetization-prepared rapid gradient-echo (MPRAGE) sequence (1mm³ isotropic resolution; 224x256x176mm³ field of view (FoV)). T2 scans were acquired using a 2D fast spin-echo (FSE) with hyperechoes with the oblique coronal plane perpendicular to the long axes of the hippocampi. The T2 sequence provided 40 slices with a resolution of 0.47x0.47x1mm³ (FoV, 352x512x40mm³) and was used to image only the hippocampal formation. GT labeling was performed on the T2 scans to utilize the higher resolution and their better contrast-to-noise ratio. Therefore, the T2-based annotation protocol of Berron et al.³ was applied, and 28 hippocampi were manually labeled. In five subjects, both hippocampi were annotated, yielding 14 masks from each hemisphere.
To ensure agreement of the T2 image and the T2-based GT-mask to their T1 counterpart, slice-wise alignment of the T2 slab followed by registration to the corresponding T1 scan was applied (cf. Figure 1). Transformations were combined and applied all at once to reduce interpolation artifacts. Moreover, only in-plane rigid transformations were used, optimized by a multi-scale conjugate gradient descent approach regarding ANTs cross-correlation⁴.

Segmentation Setup.
FreeSurfer segmentations were computed with the hippocampal-subfields-T1 protocol⁵ of FS v6.0. To be comparable with the registered GT-labels, FS-masks were registered back into each subject's T1-space. Deep learning-based segmentation was achieved with a modified U-net⁶ architecture, as indicated in Figure 3. For training the networks, the dataset was further preprocessed by applying slice-wise intensity normalization, flipping samples from the right to the left hemisphere, and performing extensive data augmentation (translations, rotations, scaling, elastic deformations) to account for MRI-related/anatomical variations. Finally, all input images were cropped around the HC to a patch size of 96x64x40 pixels.

Experimental Setup.
To assess the generalizability of our models, we split the preprocessed dataset (28 samples) into training (18 samples) and evaluation sets. A 3-fold cross-validation was used to countermeasure the small dataset. All sets were balanced regarding the side of the HC.
We trained our model with four different combinations of input images and HC-labels. First, T1 scans were used with either the previously computed FS-masks or the manual GT-labels. Another experiment, utilizing the T2 images and GT-labels, was performed to estimate the segmentation capabilities if T2 images were also available for segmentation. Finally, both modalities were used simultaneously (T1+T2).
All models were trained with cross-entropy loss, ADAM optimizer, and exponential learning-rate decay for 60000 iterations with a batch size of 2.

Results

Performance of all approaches was compared via average Dice similarity coefficients (DSC) against the manual GT. HC segmentation from FreeSurfer had the lowest DSC (78.06±3.99%), which is due to several false positive (FP) and missing (false negative (FN)) pixels (see Figure 4). Taking the FreeSurfer labels to train the CNN improved the DSC (80.85±3.61%) while the overall computation time significantly decreased. Training with the GT-labels yielded a DSC of 85.62±3.86% for T1 images and 91.91±0.87% for T2 images. For both modalities, an excellent agreement with the GT-mask is achieved with only a few FP and FN pixels, primarily at regions without distinct borders and the beginning of the HC head (see Figure 5). Using both modalities (T1+T2) simultaneously did not improve the segmentation outcome and resulted in a DSC of 91.64±0.94%.

Discussion and Conclusion

This explorative study demonstrates that deep-learning can outperform FreeSurfer’s HC segmentation on T1 images in terms of accuracy and speed (a couple of seconds vs. ~4.5h). Learning with high-resolution T2-weighted scans can achieve even more accurate segmentation results, similar to simultaneously using both modalities. Since the T2 images and GT-labels are only required as training data, the proposed method can be applied to T1 scans of already existing studies.

Acknowledgements

No acknowledgement found.

References

Thyreau B, Sato K, Fukuda H, Taki Y. Segmentation of the hippocampus by transferring algorithmic knowledge for large cohort processing. Med Image Anal. 2018 Jan;43:214–28.
Goubran M, Ntiri EE, Akhavein H, Holmes M, Nestor S, Ramirez J, et al. Hippocampal segmentation for brains with extensive atrophy using three-dimensional convolutional neural networks. Hum Brain Mapp. 2020 Feb 1;41(2):291–308.
Berron D, Vieweg P, Hochkeppler A, Pluta JB, Ding S-L, Maass A, et al. A protocol for manual segmentation of medial temporal lobe subregions in 7 Tesla MRI. Neuroimage Clin. 2017 May 26;15:466–82.
Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage. 2011 Feb 1;54(3):2033–44.
Iglesias, J. E., Augustinack, J. C., Nguyen, K., Player, C. M., Player, A., Wright, M., ... & Fischl, B. (2015). A computational atlas of the hippocampal formation using ex vivo, ultra-high resolution MRI: application to adaptive segmentation of in vivo MRI. Neuroimage, 115, 117-137.
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science. 2015. p. 234–41.

Figures

Figure 1: Methodic scheme. Utilizing the steps visualized in the left block, a high-resolution T2-based ground truth dataset for 28 hippocampi was generated. This dataset was subsequently used for training the deep learning models, schematically depicted in the right block.

Figure 2: Visualization of the MR-dataset. Subﬁgures (a)/(b) show the axial/sagittal view of a T1 scan overlayed with the corresponding T2 slab. (c) depicts a coronal slice of an original T1 image. (d) shows an oblique coronal slice of the T2 slab. In all subﬁgures, the ground truth mask of the left hippocampus is displayed in green.

Figure 3: Schematic visualization of the stacked convolutional neural network architecture utilized in this work.

Figure 4: FreeSurfer segmentation. The left column shows the unprocessed T1 patch; the remaining columns depict the corresponding T2 patch. The FS-mask is shown in orange (column 2); the comparison of the FS-mask (yellow+green) vs. the GT (green+blue) is visualized in the third column. Top four rows depict coronal slices of the HC-head, HC-body, and HC-tail. Bottom two rows show sagittal slices with the annotated HC. Legend: green = True Positives, yellow = False Positives, blue = False Negatives

Figure 5: CNN-based segmentation results of the same subject for T1 input patches (left two columns) and T2 input patches (right two columns). CNN-masks (yellow+green) were evaluated against the manual GT (green+blue). Legend: green = True Positives, yellow = False Positives, blue = False Negatives

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

3492