4264

CAST: A multi-scale Convolutional neural network based Automated hippocampal subfield Segmentation Toolbox

Zhengshi Yang¹, Xiaowei Zhuang¹, Karthik Sreenivasan¹, Virendra Mishra¹, and Dietmar Cordes^1,2
¹Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States, ²Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, United States

Synopsis

The segmentation of human hippocampal subfields on in vivo MRI has gained great interest in the last decade, because these anatomic subregions were found to be highly specialized in recent studies and are potentially affected differentially by normal aging, Alzheimer’s disease, schizophrenia, epilepsy, major depressive disorder, and posttraumatic stress disorder. However, manually segmenting hippocampal subfields is labor-intensive and time-consuming, which limits the study to a small sample size. We developed a multi-scale Convolutional neural network based Automated hippocampal subfield Segmentation Toolbox (CAST) for automated segmentation, which can be easily trained and output segmented images in one minute.

Introduction

The segmentation of human hippocampal subfields on in vivo MRI has gained great interest in the last decade, because these anatomic subregions were found to be highly specialized in recent studies [1-3]. However, manual delineation of hippocampal subregions is extremely labor-intensive and time-consuming, which limits the study to a small sample size. In addition, the inter- and intra- rater reliability is another factor which may influence the statistical power of a study. In this study, we presented a multi-scale Convolutional neural network (CNN) based Automated hippocampal subfield Segmentation Toolbox (CAST) for automatically segmenting hippocampus and some other subregions in medial temporal lobe, which can segment a new subject in one minute.

Methods

Datasets: CAST segmentation method was applied on a 7T imaging datasets downloaded from ASHS data depository (https://www.nitrc.org/projects/ashs). The 3D T2-weight TSE images from 26 subjects [4], named as UMC dataset, were collected on a 7T Philips MR imaging scanner with 0.7 x 0.7 x 0.7 mm³ isotropic voxel size and interpolated to spatial resolution of 0.35 x 0.35 x 0.35 isotropic voxel size by zero-filling during reconstruction. The manual delineation of hippocampal subregions was performed by the corresponding investigators. Network architecture in CAST: The CNN network as shown in Fig.1 has the original resolution image (blue) and two down-sampled images with factor of 3 and 5 (green and purple, respectively) as input. The green and purple boxes indicate the size of down-sampled images in the original resolution and the input cropped images have the dimension of 37³, 23³ and 21³. These three images are fed to three separate pathways with the same network architecture but independent parameters. Each pathway consists of eight consequential convolutional layers with filter size as 33 and the number of filters as 30, 30, 40, 40, 40, 40, 50 and 50 in the order. Residual connection is implemented for 4th, 6th, and 8th layers to overcome the vanishing gradient problem in a deeper neural network. The two down-sampled pathways are up-sampled to match the dimension of the output from the original pathway and concatenated with dimension of 150 x 21³ for the following concatenated convolutional block. This multi-scale CNN network consists of about 2 million parameters and is developed based on TensorFlow package (https://www.tensorflow.org) and DeepMedic project (https://github.com/deepmedic/deepmedic) [5]. TensorFlow is an open source platform for machine learning, particularly deep learning. DeepMedic provides a general framework for multi-scale convolutional neural network. The training and segmentation pipeline is shown in Fig.2. It takes about three days to train a model on a personal desktop with a Tesla K40c GPU card and less than one minute to segment a new subject with an optimized model. The dice similarity coefficient (DSC) was calculated for each subfield separately and a generalized DSC score was also computed with all subfields considered jointly. The reliability of automated segmentation was measured by using intraclass correlation coefficient (ICC), which measures the absolute agreement under a two-way random effects from a single measurement.

Results

By running CAST with a leave-one-out technique, the DSC and ICC coefficient for the 26 subjects in UMC dataset is shown in Fig.3. Surprisingly, for both CAST and ASHS, the mean generalized DSC across all subfields is 0.80 ± 0.03. Compared to ASHS segmentation, CAST substantially improved the ICC coefficients for CA2, CA3, SUB and ERC by 15%, 42%, 7% and 51%, respectively. However, CAST had worse ICC coefficient for CA1 compared to ASHS by 11%. The 3D rendering plot of CAST and manual segmentations of a single subject with generalized DSC as 0.80 is shown in Fig.4.

Discussion

Although the segmentation method in Freesurfer 6.0 is not applied on the dataset because of distinct manual segmentation protocol, a summary of the comparison between Freesurfer 6.0, ASHS and CAST is listed in Table.1. When CAST is applied on a subject with an optimized model, this toolbox only requires raw image as input and can output the segmented images in less than one minute. The computation-efficiency makes CAST applicable for large sample size and instantaneous segmentation. While the same as ASHS that best segmentation is achieved with a customized population specific atlas from the same magnet, CAST can easily be trained on a personal desktop, instead of a computer cluster, to generate the corresponding segmentation model. The automated segmentation overall is very similar to the manual segmentation but with small localized differences observed at the boundary among subfields or between subfields and background. To further investigate the distinct ICC values for CA1 subfield, we have visually inspected the manual segmentations and observed the boundary between CA1 and SUB was defined varying across subjects. Because CAST is designed to learn a consistent rule and apply it to all subjects, the variation within manual segmentation might be related to the inferior performance of segmenting CA1 subfield.

Conclusion

In this study, we present a fast automated hippocampal subfield segmentation method based on a multi-scale deep convolutional neural network, which can segment a new subject in one minute. Compared to current state-of-art method, this method achieves comparable accuracy in terms of dice coefficient and is more reliable in terms of intraclass correlation coefficient for most of subfields.

Acknowledgements

This research project was supported by the NIH (grant 1R01EB014284 and COBRE grant 5P20GM109025), Young Investigator award from Cleveland Clinic, a private grant from Peter and Angela Dal Pezzo, a private grant from Lynn and William Weidner, and a private grant from Stacie and Chuck Matthewson. The atlas and datasets for this project were generously shared by corresponding investigators and publicly available on ASHS data depository.

References

1. Inhoff, M.C. and C. Ranganath, Significance of objects in the perirhinal cortex. Trends in cognitive sciences, 2015. 19(6): p. 302-303.

2. Chadwick, M.J., H.M. Bonnici, and E.A. Maguire, CA3 size predicts the precision of memory recall. Proceedings of the National Academy of Sciences, 2014. 111(29): p. 10720-10725.

3. Leutgeb, J.K., et al., Pattern separation in the dentate gyrus and CA3 of the hippocampus. science, 2007. 315(5814): p. 961-966.

4. Wisse, L.E., et al., Automated hippocampal subfield segmentation at 7T MRI. American Journal of Neuroradiology, 2016. 37(6): p. 1050-1057.

5. Kamnitsas, K., et al., Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis, 2017. 36: p. 61-78.

Figures

Figure 1. Architecture of the multi-scale 3D convolutional neural network. The original and two down-sampled image batches are fed to three pathways. Each pathway consists of 8 sequential 3D convolutional layers but independent parameters, where the 4th, 6th, and 8th layers have residual connection as described in Fig.1. The output from these down-sampled pathways are up-sampled to match the size of original pathway and then these three outputs are concatenated for the following three layers in the concatenated convolutional block.

Figure 2. CAST training and segmentation pipeline.

Figure 3. Dice coefficients (DSC) and intraclass correlation coefficient (ICC) for all subfields from UMC dataset. The DSC and ICC values for ASHS-automated versus manual rater (named as ASHS) and of 2 independent raters (name as inter-rater) are directly taken from [4] and based on part of subjects instead of the entire dataset. Both T1 and T2 weighted images are used in ASHS, but only T2 weighted is used in CAST.

Figure 4. 3D rendering of manual and CAST segmentation of one subject in UMC dataset.

Table 1. A summary of the comparison between Freesurfer 6.0, ASHS and CAST automated segmentation methods.

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)

4264