Evaluating the variability of multicenter and longitudinal hippocampal volume measurements.

Stephanie Bogaert¹, Michiel de Ruiter², Sabine Deprez^3,4, Ronald Peeters³, Pim Pullens^5,6, Frank De Belder⁶, José Belderbos⁷, Sanne Schagen², Dirk De Ruysscher^8,9, Stefan Sunaert^3,4, and Eric Achten¹

¹Radiology, Ghent University Hospital, Ghent, Belgium, ²Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, Netherlands, ³Radiology, Leuven University Hospital, Belgium, ⁴Imaging and Pathology, KU Leuven, Belgium, ⁵Radiology, University of Antwerp, Belgium, ⁶Radiology, Antwerp University Hospital, Belgium, ⁷Radiation Oncology, Netherlands Cancer Institute, Amsterdam, Netherlands, ⁸Radiation Oncology, MAASTRO clinic Maastricht, Netherlands, ⁹Respiratory Oncology, Maastricht University Medical Center, Netherlands

Synopsis

Longitudinal multicenter MRI studies require stable and comparable measurements. We scanned two subjects in six different scanners (two vendors) at different time points and assessed hippocampal volumes manually and automatically.

Intrascanner CV was <2.62% for both techniques; Freesurfer is a good alternative for manual delineation for longitudinal studies.

Intervendor variability was sometimes lower than intrascanner variability for the manual technique, which suggests only a modest effect of hardware differences across vendors. Freesurfer results were systematically higher for vendor B compared to A; it is not recommended to compare cross-sectional Freesurfer results between vendors in this multicenter study.

INTRODUCTION / PURPOSE

This quality assurance study is part of a phase III longitudinal multicenter project investigating whether Hippocampus Avoidance Prophylactic Cranial Irradiation (HA-PCI) prevents hippocampal atrophy, compared to conventional PCI. Stable measurements across sites and over time are required. In the present study, we assessed the multicenter reproducibility and longitudinal consistency of 3D-T1 acquisitions by scanning human volunteers in the participating centers in different scanners and at different time points. Both a manual and an automatic segmentation method were used to assess hippocampal volumes.

METHODS

Acquisition - Two healthy volunteers were scanned on six different 3.0T MRI systems of two different vendors (Philips Achieva/Siemens Trio) using the clinically used head coil with the highest number of channels (Table 1). Subject 1 was scanned twice in center 1, 2, and 3 with an interval of one month. To further test intrascanner consistency, subject 2 was scanned 5 times in a row in center 1, with repositioning and reangulation in between each scan. A 3D-T1 weighted image developed for multicenter intervendor acquisitions with parameters optimized for contrast between white and gray matter and cerebrospinal fluid was scanned.¹

Processing - (1) Manual: A trained radiographer - supervised by an experienced neuroradiologist - traced the entire dataset manually, using dedicated software that allowed interpolation (syngo.MR General v.VB10A, Siemens AG, Erlangen-Germany) on a high resolution screen and a sagittal ray-tracing protocol.² The fimbria was excluded, internal white matter between the cornu ammonis regions was included. (2) Automatic: Images were automatically processed with the FreeSurfer v5.3 (Martinos Center for Biomedical Imaging, Harvard-MIT, Boston-USA) longitudinal stream and were visually checked for accuracy.³

Analysis - Coefficients of variation (CV) (CV=standard deviation/mean*100%) were calculated.⁴ To achieve comparable configurations within vendors and as calculations for vendor A were based on data from only two centers (same MR type a, both 32 channel head coil) variabilities for vendor B were calculated based on data from two out of the four participating centers (center 5 and 6: same MR type b, both 8 channel head coil) to prevent sampling bias (Table 1).

RESULTS

INTRASCANNER VARIABILITY: LONGITUDINAL TEST

(1) Test/retest – subject 1 - Both the manual and the automatic technique resulted in acceptable CV’s ranging from 0.18 to 2.62%. Freesurfer results were even more consistent with 5 out of 6 CV’s <2.0% as compared to 3 out of 6 after manual delineation (Table 2).

(2) 5 times repeated scan – subject 2 – The intrascanner consistency was confirmed by a 5 times repeated scan of subject 2 in center 1 which resulted in a CV of 0.72% for the right and 1.16% for the left hippocampus analyzed with Freesurfer (Table 2).

INTERSCANNER VARIABILITY: MULTICENTER TEST

(1) Intravendor variability - Manual delineation resulted in both the lowest and the highest intravendor variability (CV=0.00-5.03%) whereas Freesurfer ranged more consistently (CV’s=0.03-2.10%) (Table 2). Freesurfer volumes were systematically higher than manual volumes (Fig. 1); this over-estimation is well-described.⁵

(2) Intervendor variability – For the manual technique, the intervendor variability (CV=1.35-3.69%) was positioned between the intravendor variabilities of vendor A and B. Freesurfer volumes were systematically higher for vendor B than for vendor A (Fig. 1) which caused the intervendor variability (CV=3.80-5.83%) to be systematically higher than the intravendor variability of both vendor A and B (Table 2).

DISCUSSION/CONCLUSION

INTRASCANNER VARIABILITY: LONGITUDINAL TEST

Intrascanner consistency is satisfactory in this phase III study where hippocampal volumes before and after therapy will be compared. With CV’s of 0.72 and 1.16% for five repeated scans of the right and left hippocampus, one would not expect Freesurfer to cover up the expected small hippocampal volume differences before and after PCI. We consider the automatic technique to be a good alternative for the time consuming manual delineation in the case of longitudinal evaluations within centers.

INTERSCANNER VARIABILITY: MULTICENTER TEST

In the case of manual delineation, the intervendor variability is in some cases lower than intravendor and even intrascanner variability. This suggests only a modest effect of hardware differences across sites and vendors relative to intrascanner variability across time points. Freesurfer results are systematically higher for vendor B than for vendor A due to intensity differences; it is not recommended to compare cross-sectional Freesurfer results between centers with different vendors in this multicenter study.

In the future, the QA procedure could be strengthened by adding a phantom with an irregular shape of known volume inside.

Acknowledgements

No acknowledgement found.

References

(1) Jack CR, Bernstein MA, Fox, NC, et al. The Alzheimer's Disease Neuroimaging Initiative (ADNI): MRI methods. J Magn Reson Imaging. 2008;27(4):685-91.

(2) Achten E, Deblaere K, De Wagter C, et al. Intra- and interobserver variability of MRI-based volume measurements of the hippocampus and amygdala using the manual ray-tracing method. Neuroradiology. 1998;40(9):558-66.

(3) Reuter, M., Schmansky, N.J., Rosas, H.D., Fischl, B. Within-Subject Template Estimation for Unbiased Longitudinal Image Analysis. Neuroimage. 2012;61 (4): 1402-1418.

(4) Jones, R., Payne, B. Clinical investigation and statistics in laboratory medicine. London: ACB Venture Publications. 1997.Wenger E, Martensonn J, Noack H, et al. Comparing manual and automatic segmentation of hippocampal volumes: reliability and validity issues in younger and older brains. Hum Brain Mapp. 2014;35(8):4236-48.

(5) Wenger E, Martensonn J, Noack H, et al. Comparing manual and automatic segmentation of hippocampal volumes: reliability and validity issues in younger and older brains. Hum Brain Mapp. 2014;35(8):4236-48.

Figures

Table 1 - Description of hardware, software and sequences of participating centers.

Table 2 - Coefficients of variation for intrascanner, intravendor and intervendor variability of HC volumes of subject 1 and 2.

Figure 1 - Bar graphs of HC volumes acquired by manual and automatic segmentation for subject 1 and 2.

Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)

1166