The impact of data analysis method, scanner type and scan session on volume measurements of brain structures
Michael Amann1,2, Pavel Falkovskiy3,4,5, Alain Thoeni1, Tobias Kober3,4,5, Alexis Roche3,4,5, Bénédicte Maréchal3,4,5, Philippe Cattin6, Tobias Heye2, Oliver Bieri2, Till Sprenger7, Christoph Stippich2, Gunnar Krueger4,5,8, Ernst-Wilhelm Radue1, and Jens Wuerfel1

1Medical Image Analysis Center (MIAC), Basel, Switzerland, 2Department of Radiology, University Hospital of Basel, Basel, Switzerland, 3Advanced Clinical Imaging Technology, Siemens Healthcare AG, Lausanne, Switzerland, 4Department of Radiology, University Hospital (CHUV), Lausanne, Switzerland, 5École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, 6Department of Biomedical Engineering, University of Basel, Basel, Switzerland, 7Department of Neurology, DKD Helios Klinik, Wiesbaden, Germany, 8Siemens Medical Solutions USA, Boston, MA, United States

Synopsis

Performance of FreeSurfer and FSL was compared on T1-weighted 3D MRI data of 22 controls as function of scan session, scanner type and segmentation pipeline. Intra-class correlation coefficients and percentage volume differences were calculated for the segmentation results of both pipelines. Strong agreement was found for whole brain, white matter and cortex. For each pipeline, the impact of experimental factors was assessed by linear mixed effects analysis. We found significant scanner effect on the results of both segmentation pipelines. For subcortical structures, segmentation reliability was higher in FSL than in FreeSurfer, whereas for cortex and WM, FreeSurfer was more stable.

Introduction

In many neurological disorders, atrophy of the brain and/or its substructures has been identified as an important marker of pathological processes. As these changes are often subtle, it is important to investigate the effects of hardware setup and data post-processing onto the volume measurements. In this study, we compared the performance of the freely available software packages FreeSurfer (FS) [1] and FSL [2] on quantitative volumetric measurements, using T1-weighted 3D MRI data derived from different scanners as a function of scan session and segmentation pipelines.

Materials and Methods

All MRI scans were performed on a single site on four different clinical scanners (all Siemens Healthcare, Germany):

a.) 1.5T MAGNETOM Avanto (12-channel head coil),

b.) 1.5T MAGNETOM Espree (12-channel head coil),

c.) 3T MAGNETOM Prisma (64-channel head-neck coil),

d.) 3T MAGNETOM Skyra (20-channel head-neck coil).

On all scanners, the scanning protocols encompassed three-dimensional T1w MPRAGE sequences. At 3T, the spatial resolution was 1mm3 isotropic (TR/TI/BW/α/TA= 2.3s/0.9s/240Hz/px/9°/5:12min), at 1.5T, resolution was 1.25x1.25x1.20mm3 (TR/TI/BW/α/TA= 2.4s/1.0s/180Hz/px/8°/4:42min). Sequence parameters were based on the ADNI protocol [3] and were adjusted for similar signal-to-noise ratio at both field strengths.

At scanner (a) and (c), four MPRAGE scans were performed:

R0: baseline scan

R1: back-to-back scan – best case scenario, used to calculate scan/rescan-reliability

R2: scan after repositioning and new shim

R3: scan performed two to four weeks after baseline

On scanner (b) and (d), only R0 and R3 were performed.

Twenty-two healthy subjects underwent this study protocol (13 women, median age 25.0y, range 20.6y-39.4y). Before each scan session, subject’s hydration status and arterial pressure were controlled. Brain segmentation was performed with two software pipelines: FreeSurfer v5.3.0 and FSL v5.0. In the latter, SIENAX [4] was used for whole brain, grey matter (GM) and white matter (WM) segmentation, and FIRST was used [5] for subcortical GM segmentation. Different anatomical structure definitions in both pipelines were taken into account in the statistical comparisons.

IBM SPSS Statistics v22 and R v3.2.2 were used for statistical assessment. To test for the comparability of the two segmentation methods, we calculated intra-class correlation coefficients (ICC; two-way mixed-model) for each scan scenario as well as percentage differences in volume (Equation 1):

$$\triangle V = 100 \% \cdot \frac{V(FSL)-V(FS)}{0.5\cdot(V(FSL)+V(FS))} (1)$$

For each software, the impact of experimental factors on the segmentation results was assessed by linear mixed-effects analysis [6]. In the respective statistical model, scanner type, subject’s age and gender were included as fixed effects, whereas “subject” was considered as random effect with both variable intercept and slope depending on the scanning session. The significance of fixed effects was assessed by Bonferroni corrected t-tests against null hypothesis (with Satterthwaite approximated degrees-of-freedom). The reliability of segmentation was calculated by test-retest ICCs according to

$$ICC= \frac{variance(subject)}{variance(subject)+variance(scan/rescan)+variance(R2)+ variance(R3)} (2)$$

Results

In Table 1, ICC (adjusted for absolute volume) between FS and FSL are summarized for different brain structures. Strong agreement between segmentation results (ICC>0.9) was found in all scanner types for whole brain (including brainstem), WM and cortex. For subcortical GM, strong agreement was found in all scanners except of PRISMA. For smaller subcortical structures, agreement was substantially lower. The volume differences between FS and FSL were small in larger brain structures; however, for most scanner-structure combinations they reached statistical significance (Table 2).

In general, scanner type has a relevant effect on the segmentation results of both segmentation pipelines (1-4% for the larger structures, up to 9% for smaller structures such as the caudate). Interestingly, the impact of experimental factors is different: in WM, cortex and in subcortical structures, scanner effects are about a factor of two higher in FS than in FSL. Additionally for the subcortical structures, the segmentation reliability reflected by test-retest ICC was higher in FSL than in FS, whereas for cortex and WM, FS is more stable (Table 3). Compared to scan-rescan variability, the effects of R2 and R3 (repositioning, re-shim, physiological variances) were minor in all software/scanner combinations (Figure 1).

Discussion

The segmentation results of FS and FSL are similar for larger brain compartments including the whole brain, WM and cortex. However, systematic and significant differences can be observed. In subcortical structures, performance of both pipelines is considerably different. In general, FSL segmentation seems to be more stable against hardware effects; however, these effects are still significant. In conclusion, we could demonstrate that experimental factors have a significant impact on the segmentation results independently of the applied post-processing software. In multi-site and multi-scanner studies, these facts have to be considered and to be compared both to scan-rescan variability and to the expected size of the effects under investigation.

Acknowledgements

We want to thank the local MR technologists’ team for supporting us in the MR scans

References

1. Fischl B et al, Neuron 33 :341-355, 2002.

2. Smith SM et al, NeuroImage 23(S1) :208-219, 2004.

3. Jack CR et al, JMRI 27 :685-691, 2008.

4. Smith SN et al, NeuroImage 17 : 479-489, 2002.

5. Patenaude B et al, NeuroImage 56 : 907-922, 2011.

6. Winter B. arXiv :1308.5499, 2013.

Figures

Table 1. Adjusted intra-class correlation coefficients (ICC) between the segmentation results of FreeSurfer and FSL as a function of different brain structures and MR scanners. Only data from scan R0 is shown here, the results for the other scans are comparable.

Table 2. Percentage differences between FreeSurfer (FS) and FSL segmentation (SD: standard deviation; GM: grey matter). Asterisks indicate significant differences (p<0.05, Wilcoxon sign-rank test). Only data from session R0 is shown here. Subcortical GM substructures were not considered, as the definition of these structures is different in FS and FSL.

Table 3. Linear mixed effects analysis: Segmentation reliability by means of test-retest ICC.

Figure 1. Standard deviation (SD) of the random effects for FreeSurfer (FS; blue) and for FSL (orange), normalized to the mean volume of the respective structure. R2 is a scan after repositioning and re-shim, R3 is a scan two to four weeks later than baseline.



Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)
1927