Evaluating the variability of multicenter and longitudinal hippocampal volume measurements.
Stephanie Bogaert1, Michiel de Ruiter2, Sabine Deprez3,4, Ronald Peeters3, Pim Pullens5,6, Frank De Belder6, José Belderbos7, Sanne Schagen2, Dirk De Ruysscher8,9, Stefan Sunaert3,4, and Eric Achten1

1Radiology, Ghent University Hospital, Ghent, Belgium, 2Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, Netherlands, 3Radiology, Leuven University Hospital, Belgium, 4Imaging and Pathology, KU Leuven, Belgium, 5Radiology, University of Antwerp, Belgium, 6Radiology, Antwerp University Hospital, Belgium, 7Radiation Oncology, Netherlands Cancer Institute, Amsterdam, Netherlands, 8Radiation Oncology, MAASTRO clinic Maastricht, Netherlands, 9Respiratory Oncology, Maastricht University Medical Center, Netherlands


Longitudinal multicenter MRI studies require stable and comparable measurements. We scanned two subjects in six different scanners (two vendors) at different time points and assessed hippocampal volumes manually and automatically.

Intrascanner CV was <2.62% for both techniques; Freesurfer is a good alternative for manual delineation for longitudinal studies.

Intervendor variability was sometimes lower than intrascanner variability for the manual technique, which suggests only a modest effect of hardware differences across vendors. Freesurfer results were systematically higher for vendor B compared to A; it is not recommended to compare cross-sectional Freesurfer results between vendors in this multicenter study.


This quality assurance study is part of a phase III longitudinal multicenter project investigating whether Hippocampus Avoidance Prophylactic Cranial Irradiation (HA-PCI) prevents hippocampal atrophy, compared to conventional PCI. Stable measurements across sites and over time are required. In the present study, we assessed the multicenter reproducibility and longitudinal consistency of 3D-T1 acquisitions by scanning human volunteers in the participating centers in different scanners and at different time points. Both a manual and an automatic segmentation method were used to assess hippocampal volumes.


Acquisition - Two healthy volunteers were scanned on six different 3.0T MRI systems of two different vendors (Philips Achieva/Siemens Trio) using the clinically used head coil with the highest number of channels (Table 1). Subject 1 was scanned twice in center 1, 2, and 3 with an interval of one month. To further test intrascanner consistency, subject 2 was scanned 5 times in a row in center 1, with repositioning and reangulation in between each scan. A 3D-T1 weighted image developed for multicenter intervendor acquisitions with parameters optimized for contrast between white and gray matter and cerebrospinal fluid was scanned.1

Processing - (1) Manual: A trained radiographer - supervised by an experienced neuroradiologist - traced the entire dataset manually, using dedicated software that allowed interpolation (syngo.MR General v.VB10A, Siemens AG, Erlangen-Germany) on a high resolution screen and a sagittal ray-tracing protocol.2 The fimbria was excluded, internal white matter between the cornu ammonis regions was included. (2) Automatic: Images were automatically processed with the FreeSurfer v5.3 (Martinos Center for Biomedical Imaging, Harvard-MIT, Boston-USA) longitudinal stream and were visually checked for accuracy.3

Analysis - Coefficients of variation (CV) (CV=standard deviation/mean*100%) were calculated.4 To achieve comparable configurations within vendors and as calculations for vendor A were based on data from only two centers (same MR type a, both 32 channel head coil) variabilities for vendor B were calculated based on data from two out of the four participating centers (center 5 and 6: same MR type b, both 8 channel head coil) to prevent sampling bias (Table 1).



(1) Test/retest – subject 1 - Both the manual and the automatic technique resulted in acceptable CV’s ranging from 0.18 to 2.62%. Freesurfer results were even more consistent with 5 out of 6 CV’s <2.0% as compared to 3 out of 6 after manual delineation (Table 2).

(2) 5 times repeated scan – subject 2 – The intrascanner consistency was confirmed by a 5 times repeated scan of subject 2 in center 1 which resulted in a CV of 0.72% for the right and 1.16% for the left hippocampus analyzed with Freesurfer (Table 2).


(1) Intravendor variability - Manual delineation resulted in both the lowest and the highest intravendor variability (CV=0.00-5.03%) whereas Freesurfer ranged more consistently (CV’s=0.03-2.10%) (Table 2). Freesurfer volumes were systematically higher than manual volumes (Fig. 1); this over-estimation is well-described.5

(2) Intervendor variability – For the manual technique, the intervendor variability (CV=1.35-3.69%) was positioned between the intravendor variabilities of vendor A and B. Freesurfer volumes were systematically higher for vendor B than for vendor A (Fig. 1) which caused the intervendor variability (CV=3.80-5.83%) to be systematically higher than the intravendor variability of both vendor A and B (Table 2).



Intrascanner consistency is satisfactory in this phase III study where hippocampal volumes before and after therapy will be compared. With CV’s of 0.72 and 1.16% for five repeated scans of the right and left hippocampus, one would not expect Freesurfer to cover up the expected small hippocampal volume differences before and after PCI. We consider the automatic technique to be a good alternative for the time consuming manual delineation in the case of longitudinal evaluations within centers.


In the case of manual delineation, the intervendor variability is in some cases lower than intravendor and even intrascanner variability. This suggests only a modest effect of hardware differences across sites and vendors relative to intrascanner variability across time points. Freesurfer results are systematically higher for vendor B than for vendor A due to intensity differences; it is not recommended to compare cross-sectional Freesurfer results between centers with different vendors in this multicenter study.

In the future, the QA procedure could be strengthened by adding a phantom with an irregular shape of known volume inside.


No acknowledgement found.


(1) Jack CR, Bernstein MA, Fox, NC, et al. The Alzheimer's Disease Neuroimaging Initiative (ADNI): MRI methods. J Magn Reson Imaging. 2008;27(4):685-91.

(2) Achten E, Deblaere K, De Wagter C, et al. Intra- and interobserver variability of MRI-based volume measurements of the hippocampus and amygdala using the manual ray-tracing method. Neuroradiology. 1998;40(9):558-66.

(3) Reuter, M., Schmansky, N.J., Rosas, H.D., Fischl, B. Within-Subject Template Estimation for Unbiased Longitudinal Image Analysis. Neuroimage. 2012;61 (4): 1402-1418.

(4) Jones, R., Payne, B. Clinical investigation and statistics in laboratory medicine. London: ACB Venture Publications. 1997.Wenger E, Martensonn J, Noack H, et al. Comparing manual and automatic segmentation of hippocampal volumes: reliability and validity issues in younger and older brains. Hum Brain Mapp. 2014;35(8):4236-48.

(5) Wenger E, Martensonn J, Noack H, et al. Comparing manual and automatic segmentation of hippocampal volumes: reliability and validity issues in younger and older brains. Hum Brain Mapp. 2014;35(8):4236-48.


Table 1 - Description of hardware, software and sequences of participating centers.

Table 2 - Coefficients of variation for intrascanner, intravendor and intervendor variability of HC volumes of subject 1 and 2.

Figure 1 - Bar graphs of HC volumes acquired by manual and automatic segmentation for subject 1 and 2.

Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)