3239

Test-Retest and Between-Site Reliability in a Multisite Diffusion Tensor Imaging Study
Ikbeom Jang1, Sumra Bari1, Yukai Zou2,3, Nicole L. Vike3,4, Pratik Kashyap1, and Thomas M. Talavage1,2

1Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States, 2Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, United States, 3College of Veterinary Medicine, Purdue University, West Lafayette, IN, United States, 4Department of Basic Medical Sciences, Purdue University, West Lafayette, IN, United States

Synopsis

Diffusion tensor imaging (DTI) has been frequently employed in the identification of brain biomarkers for neurodevelopmental and neurodegenerative disorders due to its ability to measure spatial organization of brain tissue. Due to the need for larger sample size to address substantive questions of interest, many studies try to merge data from several scanners, and ideally, a reliability study should come first. In this study, we assess reliability of DTI measures across two systems using the intraclass correlation coefficient, such that we may pool data in future multi-site DTI studies.

INTRODUCTION

Diffusion tensor imaging (DTI) has been frequently employed in the identification of brain biomarkers for neurodevelopmental and neurodegenerative disorders due to its ability to measure spatial organization of brain tissue [1-2]. Due to the need for larger sample size to address substantive questions of interest, many studies try to merge data from several scanners, and ideally, a reliability study should come first. In this study, we assess reliability of DTI measures across two systems using the intraclass correlation coefficient (ICC) [3-4], such that we may pool data in future multi-site DTI studies.

METHODS

1) Participants: 24 healthy volunteers (13 males and 11 females; ages 22-41) each underwent four MR imaging sessions at two sites. Participants comprised a variety of ethnicities. To minimize potential changes of white matter microstructure over time, all four scans were conducted within a short period of time (median=same day, mean difference=4.75 days).

2) Data Acquisition: Two imaging sessions were conducted at Site 1 using a 16-ch Nova Medical brain array on a 3T GE Signa HDxt. The other two imaging sessions were conducted at Site 2 using a 32-ch Nova Medical head coil on a 3T GE Discovery MR750. Diffusion-weighted images were acquired using a spin-echo echo-planar imaging sequence (TR/TE=12,500/100 ms; flip angle=90; matrix=96x96; FOV=240x240 mm2; 40 axial slices; slice thickness=2.5 mm, slice gap=0 mm) with 30 encoding directions at b=1000 s/mm2 and four volumes acquired at b=0s/mm2.

3) Processing: Image processing was performed primarily using the FSL 5.0 toolbox and TBSS [5-6]. After correcting for motion and eddy current distortions [7], brain segmentation was performed. By fitting the diffusion tensor model, tensor maps were estimated, including the three primary eigenvalues, from which fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) were calculated. After excluding images that failed visual quality inspection, FA images were co-registered to the FMRIB58-FA-template. The population mean FA image was thresholded (FA>0.2) to create a mean WM skeleton. The aligned FA image of each subject was projected onto the mean WM skeleton. The same nonlinear registration and skeleton projection were applied to MD, AD, and RD.

4) Analysis: Mean FA, MD, AD, and RD across the WM skeleton were found for each subject and variance component analyses were conducted. The Shapiro-Wilks normality test was performed to assess within-group normality for ICC analysis. Then, reliability for each DTI measure was estimated between the imaging sessions conducted at each site (inter-session) using ICC (type 2, 1) [4] for the degree of absolute agreement. Both subjects and sessions were considered random effects in this two-way random-effects model. The reliability of the pairs of measures at the two sites (inter-site) was calculated using ICC (type 3, 1) [4] for the degree for consistency of agreement. In this two-way mixed-effects model, subjects were considered random effects and sites were considered fixed effects. An additional inter-site correlation analysis was performed using ICC (type 2, 1) while regarding site/scanner as a random effect to test reliability of the four typical DTI measures.

RESULTS

Results of the within-group normality test are summarized in Table 1. 19 of 24 group measures were approximately normal. Table 2 presents results of variance component analyses with associated ICCs for both test-retest (inter-session) reliability and between-site (inter-site) reliability. Inter-session ICCs from Site 1 and Site 2 were higher than 0.4 and 0.95, respectively, in all DTI measures. ICCs from Site 2 (newer MRI) were always higher than Site 1. The inter-site ICCs were higher than 0.8 for all DTI measures (Table 2). The last analysis, which considered both subject and site/scanner random effects, exhibited individual ICC (type 2, 1) of 0.866 for FA, 0.569 for RD, and 0.100 for AD.

DISCUSSION

Based on guidelines for ICC interpretation [8], inter-session and inter-site reliability were always at least fair and were typically excellent for the site with the newer MRI (Site 2). The inter-site analyses showed excellent reliability in FA, AD, and RD, noting that FA is generally more reliable than RD and AD. This is an encouraging finding in support of multi-site diffusion-based studies, pending confirmation of stable distributions of diffusion measures.

CONCLUSION

The inter-session analyses suggest fair-to-excellent reliability for DTI measures, with the age/quality of MRI hardware apparently being the predominant factor. Findings from inter-site analyses are supportive of traditional DTI measures being used in multi-site studies.

Acknowledgements

The authors would like to thank Dr. Gregory Tamer, Jr., Xianglun Mao, Liesl Krause, and Jana Vincent for assistance in data collection.

References

1. Basser PJ, Mattiello J, LeBihan D. MR diffusion tensor spectroscopy and imaging. Biophysical journal. 1994;66(1):259-67.

2. Le Bihan D, Mangin JF, Poupon C, Clark CA, Pappata S, Molko N, Chabriat H. Diffusion tensor imaging: concepts and applications. Journal of magnetic resonance imaging. 2001;13(4):534-46.

3. Cronbach L. The dependability of behavioral measurements. theory of generalizability of scores and profiles. 1972.

4. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychological bulletin. 1979;86(2):420.

5. Smith SM, Jenkinson M, Johansen-Berg H, Rueckert D, Nichols TE, Mackay CE, Watkins KE, Ciccarelli O, Cader MZ, Matthews PM, Behrens TE. Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. Neuroimage. 2006;31(4):1487-505.

6. Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage. 2004;23:S208-19.

7. Andersson JL, Sotiropoulos SN. An integrated approach to correction for off-resonance effects and subject movement in diffusion MR imaging. Neuroimage. 2016;125:1063-78.

8. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological assessment. 1994;6(4):284.

Figures

Table 1. Shapiro-Wilks within-group normality test for each DTI measure.

Table 2. Inter-session and inter-site ICCs for DTI measures.

Proc. Intl. Soc. Mag. Reson. Med. 26 (2018)
3239