3000

Evaluate the effects of software on repeatability and reproducibility in brain volume measurements

Ruifeng Dong¹, Amritha Nayak^1,2,3, and Carlo Pierpaoli¹
¹Laboratory on Quantitative Medical Imaging, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, MD, United States, ²Henry Jackson Foundation for Advancement of Military Medicine, Bethesda, MD, United States, ³2) The Military Traumatic Brain Injury Initiative (MTBI2), Uniformed Services University of the Health Sciences, Bethesda, MD, United States

Synopsis

Keywords: Segmentation, Segmentation

Motivation: MRI-based brain volumetry is a valuable tool to assess human brain development and brain disorders. However, insuring repeatability and reproducibility is essential for a larger dissemination.

Goal(s): Our goal was to compare the repeatability/reproducibility in volume measurements by different popular software tools.

Approach: We performed T2w scans and repeated T1w scans for 82 subjects on two 3T scanners. We computed the volume within- and between-scanner variabilities.

Results: Improved masking helps reduce variability in Freesurfer’s volume measurements. Synthseg and vol2Brain give better volume repeatability/reproducibility compared to Freesurfer.

Impact: Our results provide quantification of the effects on repeatability/reproducibility from different software enabling clinicians and researchers to make an informed choice for data processing.

Introduction

Volumetry is helpful for assessment of normal human brain developments and disorders. However, volume measurements from MRI show variability across scanners, sequences, subjects, and measuring tools. Little information is available in the literature on the relative importance of these sources of variability. Only a few studies ^1-5 were done on contributions of software to the reproducibility in volume measurements. To this end, we estimated and compared the effects of 3 automated software, Freesurfer 6.0.0 (FS6) and 7.4.1 (FS741) ⁶, Synthseg 2.0 ⁷ and vol2Brain ⁸,on within- and between-scanner variability in the volume measurements of 32 brain regions.

Methods

Data Acquisition
We analyzed MRI data collected on 82 subjects, ages 23 to 64, 32 females and 50 males, with the goal of selecting the most reproducible strategy to derive structural volumetric measurements. Each subject was scanned on a Siemens Prisma 3T scanner with T1w and T2w prescriptions, and a Biograph 3T scanner with a T2w scan and two repeated T1w scans in one session. This produced 246 T1w and 164 T2w images for this study.
On the Biograph scanner, we used MPRAGE sequence for T1w scans with sagittal slices, 1x1x1mm resolution, TE/TR=3.03/2530ms, FA=7 deg, and TI=1100ms. We used 3D SPACE sequence for T2w scans, with 0.49x0.49x1mm resolution, TE/TR=280/3200ms, and FA=120 deg.
On the Prisma scanner, we used MPRAGE for T1w scans, with sagittal slices, 1x1x1mm resolution, TE/TR=3.3/2530ms, and FA=7 deg. We used fat-suppressed TSE sequence for T2w scans with axial slices, 1.25x1.25x1.7mm resolution, TE/TR=72/8810ms, and FA=120 deg.
Brain volumetry
By default, Freesurfer, Synthseg and vol2Brain all use T1w images as input. It has been shown previously that the intracranial volume (ICV) calculation is improved using the software MONSTR with a proper atlas reference and combining information from T1w and T2w images ⁹. Freesurfer allows the use of an external ICV mask, therefore, this software was evaluated with and without the ICV mask computed independently with MONSTR. For all software, the reported ROI’s volume was normalized by ICV.
3. Repeatability/reproducibility measures
The within- and between-scanner variability are both measured by signed coefficients of variation (CV), which is defined as the difference between two measurements, either within or between-scanner, divided by their average.
We also computed a bias corrected unsigned CV, according to this formula: unsigned CV = |signed CV – average_population CV| The median unsigned CV is used for each ROI and each software to represent its within- or between-scanner variability.

Results and Discussion

Intracranial volume
Figure 1: Freesurfer default, Synthseg, and vol2Brain systematically overestimate ICV by 0.15, 0.10 and 0.02 liters respectively, compared with MONSTR.
Figure 2: shows the range of variability of measured ICV across subjects. Freesurfer default yields the largest ICV variability while Synthseg produce the best within-scanner repeatability.
Brain regional volumes
Figure 3: within- and between-scanner variabilities vary similarly across ROIs, for all software. Their Pearson correlation coefficients are 0.92, 0.95, 0.89, 0.83, and 0.96, for FS741 with MONSTR, FS6 with MONSTR, FS741 default, Synthseg, and vol2Brain, respectively.
MONSTR helps reduce Freesurfer’s variability in volume measurements of most ROIs. Freesurfer’s default masking could cause imperfect brain registration leading to instability in segmentation, which is alleviated by MONSTR.
Using MONSTR’s mask, compared with FS6, FS741 does not improve repeatability/reproducibility consistently across all ROIs. While FS741 shows better results in caudate volume measurement, FS6 shows less variability for amygdala and cerebellum white matter.
Freesurfer with MONSTR's mask shows better repeatability/reproducibility than vol2Brain for cortex and cerebral white matter. This may be due to the surface-based refinements in cortical segmentation done by Freesurfer.
Synthseg gives best repeatability/reproducibility for many ROIs, including cortical and subcortical gray matter, cerebral and cerebellum white matter, accumbens, amygdala, and ventricles. Though overestimating ICV, it shows little effect on the variability of regional volume measurements. Moreover, Synthseg was trained on images from randomly rotating, shifting and shearing synthetic images. This helps reduce volume variability caused by these kinds of spatial variations in human scans.
vol2Brain excels in ventral DC, thalamus, pallidum, putamen. Synthseg and vol2Brain are comparable in their repeatability/reproducibility for hippocampus, caudate and brain stem.

Conclusions

Using MONSTR’s mask reduces Freesurfer's variability in brain regional volume measurements.
FS741 doesn’t improve repeatability/reproducibility in volume measurements across all brain regions, compared with FS6.
Freesurfer with MONSTR produces more repeatable/reproducible measurements of cortex and cerebral white matter volumes than vol2Brain.
vol2Brain gives most repeatable/reproducible volume measurements of ventral DC, thalamus, pallidum, and putamen, while Synthseg excels in many other brain regions, though it overestimates ICV compared to MONSTR.

Acknowledgements

No acknowledgement found.

References

[1] Hans-Jürgen Huppertz et al., “Intra- and Interscanner Variability of Automated Voxel-Based Volumetry Based on a 3D Probabilistic Atlas of Human Cerebral Structures,” NeuroImage 49, no. 3 (February 1, 2010): 2216–2224.

[2] José V. Manjón and Pierrick Coupé, “VolBrain: An Online MRI Brain Volumetry System,” Frontiers in Neuroinformatics 10 (2016).

[3] Robin Wolz et al., “Robustness of Automated Hippocampal Volumetry across Magnetic Resonance Field Strengths and Repeat Images,” Alzheimer’s & Dementia 10, no. 4 (2014): 430.

[4] Mandy Melissa Jane Wittens et al., “Inter- and Intra-Scanner Variability of Automated Brain Volumetry on Three Magnetic Resonance Imaging Systems in Alzheimer’s Disease and Controls,” Frontiers in Aging Neuroscience 13 (October 7, 2021): 746982.

[5] Enrica Cavedo et al., “Fully Automatic MRI-Based Hippocampus Volumetry Using FSL-FIRST: Intra-Scanner Test-Retest Stability, Inter-Field Strength Variability, and Performance as Enrichment Biomarker for Clinical Trials Using Prodromal Target Populations at Risk for Alzheimer’s Disease,” Journal of Alzheimer’s disease: JAD 60, no. 1 (2017): 151–164.

[6] Bruce Fischl et al., “Whole Brain Segmentation: Automated Labeling of Neuroanatomical Structures in the Human Brain,” Neuron 33, no. 3 (January 31, 2002): 341–355.

[7] Billot B, Greve DN, Puonti O, Thielscher A, Van Leemput K, Fischl B, Dalca AV, Iglesias JE; ADNI. SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining. Med Image Anal. 2023 May;86:102789.

[8] Manjón JV, Romero JE, Vivo-Hernando R, Rubio G, Aparici F, de la Iglesia-Vaya M, Coupé P. vol2Brain: A New Online Pipeline for Whole Brain MRI Analysis. Front Neuroinform. 2022 May 24;16:862805.

[9] Snehashis Roy et al., “Robust skull stripping using multiple MR image contrasts insensitive to pathology”, NeuroImage, Volume 146, 2017, 132-147.

Figures

The intracranial volume measured by Freesurfer 7.4.1, Synthseg and vol2Brain vs. that measured by MONSTR, using all data including the two repeated scans on the Biograph scanner and one scan on the Prisma scanner, for each of 82 subjects.

The range of subject-specific within- and between-scanner signed coefficients of variation (CV) of the intracranial volume (ICV), across all subjects, for each software. Here the lower and upper limits are defined as the 5% and 95% percentiles. Here the within-scanner CV is calculated from the ICV measurements on 2 repeated scans on the Biograph scanner. The between-scanner CV is calculated from the ICV measurements from scans on the Prisma scanner and the average of the ICV measurements from the two repeated scans on the Biograph scanner.

Comparison of segmentation software in within-scanner repeatability and between-scanner reproducibility. The metric is the median value of the bias-corrected unsigned coefficient of variation. The within-scanner CV is calculated from the volumetry on the 2 repeated scans on the Biograph scanner. The between-scanner CV is calculated from the volumetry on the scans on the Prisma scanner and the average of the volumetry from the two repeated scans on the Biograph scanner.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

3000

DOI: https://doi.org/10.58530/2024/3000