1200

3D Hybrid Deep Learning Solution for Subcortical Segmentation

Aaron Cao¹, Vishwanatha Rao², Xinru Liu³, and Jia Guo^4,5
¹Valley Christian High School, San Jose, CA, United States, ²Department of Biomedical Imaging, Columbia University, New York City, NY, United States, ³The Village School, Houston, TX, United States, ⁴Department of Psychiatry, Columbia University, New York City, NY, United States, ⁵Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York City, NY, United States

Synopsis

Keywords: Analysis/Processing, Neuro

Motivation: For subcortical brain segmentation, the most widely accepted tools like FreeSurfer are slow and inefficient for large datasets, while faster methods often sacrifice accuracy and reliability.

Goal(s): In this study, we propose a novel deep learning based alternative and achieve consistent state-of-the-art performance within reasonable processing times.

Approach: Our model, TABSurfer, utilizes a 3D patch-based approach with a hybrid CNN-Transformer architecture.

Results: We evaluated TABSurfer against FreeSurfer ground truths across various T1w MRI datasets, consistently demonstrating strong performance over a leading deep learning benchmark, FastSurferVINN. Then, we validated TABSurfer on a manual reference, outperforming both FreeSurfer and FastSurferVINN based on the gold standard.

Impact: Our proposed deep learning model, TABSurfer, demonstrated state-of-the-art subcortical segmentation performance and utility. TABSurfer displayed reliability across numerous datasets and outperformed well established traditional and deep learning tools in FreeSurfer and FastSurferVINN.

Introduction

Subcortical segmentation involves the semantic segmentation of voxels in brain MRI scans into subcortical regions, delivering important applications in quantitative structural analysis of morphological deficits relating to certain neuropsychiatric diseases^1,2,3. While manual segmentation is the most accurate method, it is very tedious and difficult even for experts.
Therefore, automated computer tools like FreeSurfer⁴ have been developed to automate the process. Although FreeSurfer is the most widely accepted standard, the automatic subcortical segmentation can take over 11 hours to complete on one scan. While Convolutional Neural Network (CNN) based alternatives like FastSurferVINN⁵ have recently been proposed, they often sacrifice performance and generality for their increased speed. Such deep learning methods are often limited by their 2D slice based approach and CNN architecture, not fully capturing the context in order to reliably generate precise segmentations.
On the other hand, 3D patch based solutions are better suited to capture such geometries, and the Transformer architecture has recently risen with promising results when used in combination with CNNs^6,7. With these insights, we propose TABSurfer, a novel 3D patch-based CNN-Transformer hybrid deep learning architecture that achieves superior segmentation accuracy and consistency across various datasets compared to cutting-edge traditional and deep learning tools.

Methods

TABSurfer is inspired by the TABS architecture previously introduced for brain tissue segmentation⁸. The pipeline and model are visualized in Figure 1. Patches are extracted from the volume and fed into our model, which resembles a UNet except with a Vision Transformer encoder module replacing the standard CNN bottleneck. The model outputs probability maps for our targeted 31 subcortical regions which are reconstructed to the shape of the input image to vote on the predicted class for each voxel.
We trained the model described above on a 24 GB NVIDIA Quadro 6000 graphical processing unit, using the AdamW optimizer with learning rate 1e-6 and weight decay 1e-4. The loss function we used was Dice Loss and we utilized three forms of augmentation: affine, noise, and blur.
For our data, we collected 1788 1mm resolution T1w MRI scans from 10 publicly available datasets with age and gender distributions shown in Figure 2. After preprocessing with FreeSurfer, we divided the subjects with a 3:1:1 train/val/test split. We also obtained 20 manually segmented scans from the MindBoggle101 dataset⁹, 5 of which were added to the training dataset and the rest were used for testing.
To evaluate, we first tested TABSurfer against Fastsurfer using 364 FreeSurfer segmentations as ground truths. Then, we validated TABSurfer against both FastSurfer and FreeSurfer on 15 manual segmentations. We used the Dice Similarity Coefficient (DSC) and the Average Symmetric Surface Distance (ASSD) metrics to evaluate both the overall similarity of the segmentations and quality of the contours against the ground truth.

Results

On the test against the FreeSurfer ground truths, TABSurfer demonstrated consistent high metrics across each dataset as shown in Figure 3. On the other hand, the benchmark, FastSurferVINN, struggled to perform reliably. Qualitative evaluation in Figure 4 also reveals TABSurfer’s increased segmentation quality compared to both FreeSurfer and FastSurferVINN. Results from evaluating TABSurfer, FastSurfer, and FreeSurfer compared to the manual reference are shown in Figure 5. TABSurfer demonstrated higher performance compared to FastSurfer again, and FreeSurfer performed the worst. While the manual segmentations prove difficult to replicate, TABSurfer outperforms both alternatives.

Discussion

This study investigates TABSurfer, a novel 3D patch-based Transformer-CNN hybrid deep learning model, on the task of subcortical segmentation. TABSurfer demonstrated reliable performance across multiple datasets with reasonable processing times compared to FreeSurfer, and outperformed both competitors based on the gold standard manual segmentations. Through these tests, we showcase the advantages of both the CNN-Transformer hybrid architecture and our 3D patch-based approach. While tools like FreeSurfer and FastSurfer make tradeoffs between speed and consistency of performance, TABSurfer achieves a balance of both qualities which are crucial for a reliable segmentation tool. Future studies should further investigate the generalizability and reliability of our model by testing on unseen datasets, holding test-retest experiments, and validating on scans of varying resolution and quality. Additionally, we should condense our model to be more computationally efficient and explore the capabilities of this architecture for segmenting more classes in the cortical region as well.

Conclusion

We propose TABSurfer, a novel deep learning model achieving state-of-the-art subcortical segmentation performance and reliability through a hybrid CNN-Transformer architecture and a 3D patch-based approach. This study exhibits TABSurfer's potential as a capable tool for fully automated subcortical segmentation with high fidelity, and as a superior alternative to existing traditional and deep learning methods.

Acknowledgements

No funding was received for conducting this study. The authors have no relevant financial or non-financial interests to disclose.

References

Gutman BA, van Erp TGM, Alpert K, Ching CRK, Isaev D, Ragothaman A, et al. A meta-analysis of deep brain structural shape and asymmetry abnormalities in 2,833 individuals with schizophrenia compared with 3,929 healthy volunteers via the ENIGMA Consortium. Hum Brain Mapp. 2022;43:352–72.
Ho TC, Gutman B, Pozzi E, Grabe HJ, Hosten N, Wittfeld K, et al. . Subcortical shape alterations in major depressive disorder: findings from the ENIGMA major depressive disorder working group. Hum Brain Mapp. (2022) 43:341–51.
van der Velpen IF, Vlasov V, Evans TE, Ikram MK, Gutman BA, Roshchupkin GV, Adams HH, Vernooij MW, Ikram MA. Subcortical brain structures and the risk of dementia in the Rotterdam Study. Alzheimer's & Dementia. 2023 Feb;19(2):646-57.
Fischl B. FreeSurfer. Neuroimage. 2012 Aug 15;62(2):774-81.
Henschel L, Kügler D, Reuter M. FastSurferVINN: Building resolution-independence into deep learning segmentation methods—A solution for HighRes brain MRI. NeuroImage. 2022 May 1;251:118933.
Hatamizadeh A, Yang D, Roth HR, Xu D. UNETR: Transformers for 3D Medical Image Segmentation. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 2021;1748-1758.
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. 2021 Feb 8.
Rao VM, Wan Z, Arabshahi S, Ma DJ, Lee PY, Tian Y, Zhang X, Laine AF, Guo J. Improving across-dataset brain tissue segmentation for MRI imaging using transformer. Frontiers in Neuroimaging. 2022 Nov 21;1:1023481.
Klein A, Tourville J. 101 labeled brain images and a consistent human cortical labeling protocol. Frontiers in neuroscience. 2012 Dec 5;6:171.
Ellis KA, Bush AI, Darby D, De Fazio D, Foster J, Hudson P, Lautenschlager NT, Lenzo N, Martins RN, Maruff P, Masters C. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer's disease. International psychogeriatrics. 2009 Aug;21(4):672-87.
Orhaug T, Forssell G. Information extraction from images. InOuter Space-A New Dimension of the Arms Race 2021 Jan 26 (pp. 215-227). Routledge.
Marcus DS, Wang TH, Parker J, Csernansky JG, Morris JC, Buckner RL. Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. Journal of cognitive neuroscience. 2007 Sep 1;19(9):1498-507.
Marcus DS, Fotenos AF, Csernansky JG, Morris JC, Buckner RL. Open access series of imaging studies: longitudinal MRI data in nondemented and demented older adults. Journal of cognitive neuroscience. 2010 Dec 1;22(12):2677-84.
Wei D, Zhuang K, Chen Q, Yang W, Liu W, Wang K, Sun J, Qiu J. Structural and functional MRI from a cross-sectional Southwest University Adult lifespan Dataset (SALD). bioRxiv. 2017 Aug 17:177279.
Wang L, Alpert KI, Calhoun VD, Cobia DJ, Keator DB, King MD, Kogan A, Landis D, Tallis M, Turner MD, Potkin SG. SchizConnect: Mediating neuroimaging databases on schizophrenia and related disorders for large-scale integration. Neuroimage. 2016 Jan 1;124:1155-67.
Liu W, Wei D, Chen Q, Yang W, Meng J, Wu G, Bi T, Zhang Q, Zuo XN, Qiu J. Longitudinal test-retest neuroimaging data from healthy young adults in southwest China. Scientific data. 2017 Feb 14;4(1):1-9.
Marek K, Jennings D, Lasch S, Siderowf A, Tanner C, Simuni T, Coffey C, Kieburtz K, Flagg E, Chowdhury S, Poewe W. The Parkinson progression marker initiative (PPMI). Progress in neurobiology. 2011 Dec 1;95(4):629-35.
Zuo XN, Anderson JS, Bellec P, Birn RM, Biswal BB, Blautzik J, Breitner J, Buckner RL, Calhoun VD, Castellanos FX, Chen A. An open science resource for establishing reliability and reproducibility in functional connectomics. Scientific data. 2014 Dec 9;1(1):1-3.

Figures

Figure 1.a. In our pipeline, extracted 3D patches from the input volume are fed into the model sequentially with a step size of 16. Outputs are then reconstructed to generate the full segmentation.

b. The model architecture consists of 4 CNN encoder and decoder layers with skip connections, and a Vision Transformer encoder in between. Within each convolutional block is a sequence of 3D Convolution, Group Normalization, and Rectified Linear Unit (ReLU). The Transformer module has 16 heads and 8 layers.

Figure 2. Age and gender distributions for selected subjects from the Australian Imaging Biomarkers and Lifestyle Study of Ageing¹⁰, Frontotemporal Lobar Degeneration Neuroimaging Initiative, Information eXtraction from Images¹¹, Open Access Series of Imaging Studies-1¹², Open Access Series of Imaging Studies-2¹³, Southwest University Adult life-span Dataset¹⁴, Southwest University Longitudinal Imaging Multimodal Brain Data Repository¹⁵, Parkinson’s Progression Markers Initiative¹⁶, SchizConnect¹⁷, and Consortium for Reliability and Reproducibility¹⁸.

Figure 3. Box plots with metrics from evaluating TABSurfer and FastSurferVINN against the FreeSurfer generated ground truth are shown. TABSurfer demonstrated consistently high metrics with low variability across each dataset, achieving a mean Dice Similarity Coefficient (DSC) of 0.872 and an Average Symmetric Surface Distance (ASSD) of 0.374. On the other hand, the deep learning benchmark, FastSurferVINN, struggled to perform reliably and only reached an average DSC and ASSD of 0.854 and 0.436 respectively.

Figure 4. Sample slices and volumes are shown. TABSurfer captures each structure more fully compared to FastSurferVINN while also obtaining smoother contours of each region compared to FreeSurfer. Yellow arrows on the sample slices indicate mistakes in the FastSurferVINN segmentation compared to TABSurfer and FreeSurfer.

Figure 5. Box plots with metrics from evaluating TABSurfer, FastSurferVINN, and FreeSurfer against a manual reference are shown. On the 15 randomly selected scans, FreeSurfer performed the worst and FastSurferVINN was a bit better, with average Dice Similarity Coefficients of 0.740 and 0.758 respectively. TABSurfer outperformed both of them by a considerable margin, with an average Dice Similarity Coefficient of 0.792.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

1200

DOI: https://doi.org/10.58530/2024/1200