1984

Reproducibility Study of a Longitudinal Pipeline for Brain Volumetry based on Partial Volume Estimation

Ricardo A. Corredor-Jerez^1,2,3, Mário João Fartaria^1,2,3, Adrian Tsang⁴, Robert Bermel⁵, Stephen E. Jones⁵, Izlem Izbudak⁶, Ellen M Mowry⁶, Yvonne W. Lui⁷, Lauren Krupp⁷, Elizabeth Fisher⁴, Tobias Kober^1,2,3, and Bénédicte Maréchal^1,2,3

¹Advanced Clinical Imaging Technology, Siemens Healthcare AG, Lausanne, Switzerland, ²Department of Radiology, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland, ³Signal Processing Laboratory (LTS 5), École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, ⁴Biogen, Cambridge, MA, United States, ⁵Cleveland Clinic, Cleveland, OH, United States, ⁶Johns Hopkins University, Baltimore, MD, United States, ⁷New York University, New York, NY, United States

Synopsis

A reliable and accurate quantification of brain tissue loss is important to measure progressive atrophy caused by neurological diseases such as multiple sclerosis. However, accuracy and reproducibility of current methods are often limited by partial volume effects, especially at tissue interfaces where subtle atrophy patterns are likely to occur. We propose a longitudinal pipeline for brain tissue segmentation incorporating partial volume estimation to increase longitudinal robustness. Results show an increase in reproducibility of 44% compared to methods not including partial volume effects in volume estimation, suggesting that these effects should be taken into account for longitudinal atrophy measurements.

Introduction

Annualized MR-based brain atrophy quantification has become an important marker for many neurodegenerative diseases. In particular, multiple sclerosis (MS) requires precise longitudinal brain volume estimation to routinely monitor disease severity and progression. Several segmentation methods have been proposed to accurately detect brain volume changes^1,2; meeting such performance criteria remains however challenging³. Brain parenchymal fraction (BPF) defined as the ratio between brain volume (BV) and a head-size normalization factor (usually total intracranial volume or brain outer contour), has proven to be a reliable metric for the analysis of brain tissue loss⁴. Nevertheless, atrophy quantification is prone to errors due to partial volume effects or tissue misclassifications, e.g. scalp or dura mater counted as brain due to skull-stripping errors. This work introduces an automated algorithm for longitudinal brain atrophy quantification incorporating partial volume estimation and compares its performance in a reproducibility study.

Materials and Methods

Thirty MS patients from three institutions provided written informed consent for participating in a scan-rescan study. Each patient was scanned four times in two days (two scans per day) within one week. A 3D MPRAGE sequence (TR=2300ms, TI=900ms, matrix size=240x256x176; voxel=1×1×1mm³) was acquired in each session on different 3T scanners (MAGNETOM Verio, Skyra or Prisma^fit, all Siemens Healthcare, Erlangen, Germany).

First, we applied an affine registration⁵ to the four MPRAGE images (first scan as reference) followed by N4-bias field correction⁶ and an in-house skull-stripping algorithm⁷ providing the normalization factor for the BPF computation. Taking as input the resulting skull-stripped images, we compared the reproducibility of the following algorithms:

a) [MODEL-BASED] 5-class Gaussian mixture model-based algorithm⁷

b) [PV] partial volume estimation algorithm⁸

c) [LONG] time-invariant skull-stripped mask generation (see details below) followed by PV

Brain and cerebrospinal fluid (CSF) volumes are estimated by summing up probabilities (a) or concentrations (b and c).

The longitudinal approach (c), consisted in creating a patient-specific time-invariant intracranial mask as the intersection of the four skull-stripped masks followed by an erosion with a spherical kernel (7x7x7 voxels) to mitigate volume variability induced by skull-stripping errors. The resulting eroded region volume was used as BPF normalization factor. The new mask was applied to extract the eroded intracranial region TIV_e of each time point. Bias-field correction and partial volume estimation were launched using these TIV_e.

Reproducibility was evaluated as absolute difference of the BPF, as well as fuzzy Dice⁹ for each pairwise comparison according to the following scenarios: same-day, same-scanner (SDSS, N=38); same-day, different-scanner (SDDS, N=12); different-day, same-scanner (DDSS, N=38); different-day, different-scanner (DDDS, N=52).

Results

Images from twenty-five subjects were included in the analysis; five cases failed the quality assessment (no distortion corrected images available). The overall variability of the three algorithms is presented in Figure 1. They all show higher variability due to the use of different scanners (DDDS and SDDS) compared to the variability due to temporal changes (DDSS and SDSS). The MODEL-BASED algorithm introduced the highest absolute BPF differences in all scenarios (min. and max. median: 0.43% and 0.57%, respectively). Both methods using partial volume estimation outperformed these results with a significant improvement on DDSS and SDSS. We observed an overall reduction of the spread across all scenarios for PV and LONG methods. LONG algorithm achieved the best results (median DDDS: 0.39%, DDSS: 0.23%, SDDS: 0.33%, SDSS 0.16%) confirmed by fuzzy Dice values higher than 99% (Figure 2).

Discussion and Conclusions

Previous studies showed that ignoring partial volume in volume quantification could lead to significant errors¹⁰. Our results confirm this observation and indicate that partial volume helps improving volume estimates reproducibility. Volumes are thus consistently estimated over regions susceptible to partial volume effects, particularly at GM/CSF interfaces. Nevertheless, improvements in reproducibility are valuable only if accuracy is not compromised. Ongoing work aims at comparing our algorithm accuracy against other methods.

Moreover, there is an improvement using TIV_e for longitudinal analyses. The performance of the skull-stripping methods has a non-negligible effect on brain tissue classification; the erosion helps mitigating the effect of misclassified extracerebral tissue under the assumption that brain tissue loss correlates with an increase in CSF volume, both in the ventricles and cortical areas. This method still needs additional validation, particularly by applying it on longitudinal datasets.

Our results confirm prior reports that the use of different scanner hardware has a considerable impact on the longitudinal variability¹¹ (see SDDS and DDDS values) which may suggest the need for additional calibration strategies to account for these hardware effects¹². This would further help translating measurement of brain atrophy to clinical routine at the individual patient level.

Acknowledgements

No acknowledgement found.

References

[1] Zivadinov, R., Jakimovski, D., Gandhi, et al. Clinical relevance of brain atrophy assessment in multiple sclerosis. Implications for its use in a clinical routine. Expert Review of Neurotherapeutics. 2016; 16(7), 777–793. https://doi.org/10.1080/14737175.2016.1181543

[2] Reuter, M., Schmansky, N. J., Rosas, H. D., & Fischl, B. Within-subject template estimation for unbiased longitudinal image analysis. Neuroimage. 2012; 61(4), 1402–1418. https://doi.org/10.1016/j.neuroimage.2012.02.084

[3] Sastre-Garriga, J., Pareto, D., & Rovira, À. Brain Atrophy in Multiple Sclerosis: Clinical Relevance and Technical Aspects. Neuroimaging Clinics of North America. 2017; 27(2), 289–300. https://doi.org/https://doi.org/10.1016/j.nic.2017.01.002

[4] Vågberg, M., Granåsen, G., & Svenningsson, A. Brain Parenchymal Fraction in Healthy Adults—A Systematic Review of the Literature. P. Lundberg, Ed., PLoS ONE. 2017. San Francisco, CA USA. https://doi.org/10.1371/journal.pone.0170018

[5] Klein, S., Staring, M., Murphy, et al. elastix: a toolbox for intensity-based medical image registration. IEEE Transactions on Medical Imaging. 2010; 29(1), 196–205. https://doi.org/10.1109/TMI.2009.2035616

[6] Tustison, N. J., Avants, B. B., Cook, P. A., et al. N4ITK: Improved N3 Bias Correction. IEEE Transactions on Medical Imaging. 2010; 29(6), 1310–1320. https://doi.org/10.1109/TMI.2010.2046908

[7] Schmitter D, Roche A, Maréchal B, et al. An evaluation of volume-based morphometry for prediction of mild cognitive impairment and Alzheimer’s disease. NeuroImage : Clinical. 2015;7:7-17. doi:10.1016/j.nicl.2014.11.001.

[8] Roche, A. and F. Forbes, Partial volume estimation in brain MRI revisited Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014. 2014, Springer. p. 771-778.

[9] Roche, A., Ribes, D., Bach-Cuadra, M., & Kruger, G. On the convergence of EM-like algorithms for image segmentation using Markov random fields. 2011. Medical Image Analysis; 15(6), 830–839. https://doi.org/10.1016/j.media.2011.05.002

[10] Tohka, J. Partial volume effect modeling for segmentation and tissue classification of brain magnetic resonance images: A review. 2014. World Journal of Radiology; 6(11), 855–864. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4241492/

[11] Kruggel, F., Turner, J., Muftuler, L. T., & Initiative, T. A. D. N. Impact of scanner hardware and imaging protocol on image quality and compartment volume precision in the ADNI cohort. NeuroImage. 2010; 49(3), 2123–2133. https://doi.org/10.1016/j.neuroimage.2009.11.006

[12] Amann, M., Falkovskiy, P., Thoeni, A., et al. The impact of data analysis method, scanner type and scan session on volume measurements of brain structures. In ISMRM, 24th Annual Meeting. 2016.

Figures

Figure 1. Brain Parenchymal Fraction (BPF) absolute difference per scenario. An important reduction of the variability and the spread is perceived by using methods that estimate partial volume. MODEL_BASED - median DDDS: 0.52%, DDSS: 0.47%, SDDS: 0.57%, SDSS 0.43%. PV - median DDDS: 0.44%, DDSS: 0.23%, SDDS: 0.38%, SDSS 0.18%. LONG - median DDDS: 0.38%, DDSS: 0.23%, SDDS: 0.33%, SDSS 0.16%.

Figure 2. Fuzzy Dice of spatial overlap evaluated for the LONG method across each of the four scenarios. Median DDDS: 99.51%, DDSS: 99.70%, SDDS: 99.49%, SDSS 99.73%. Mean DDDS: 99.49%, DDSS: 99.67%, SDDS: 99.50%, SDSS 99.70%.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)

1984