Evaluation the cluster-size inference with random field and permutation methods for group-level MRI analysis

Huanjie Li¹, Lisa D. Nickerson², Yang Fan³, Thomas E. Nichols⁴, and Jia-Hong Gao⁵

¹Department of Biomedical Engineering, Dalian University of Technology, Dalian, China, People's Republic of, ²McLean Imaging Center, McLean Hospital/Harvard Medical School, Belmont, MA, United States, ³GE Healthcare, MR Research China, Beijing, China, People's Republic of, ⁴Department of Statistics and Warwick Manufacturing Group, University of Warwick, Coventry, United Kingdom, ⁵Center for MRI Research, Peking University, Beijing, China, People's Republic of

Synopsis

Threshold-free cluster enhancement (TFCE) outperforms the cluster-size test (CST) based on random field theory and our recent papers provide two voxelation-corrected CST (v-CST and vn-CST) which also show the clear advantage over other CST as well. However, it’s not clear which one shows better performance for MRI data analysis. This work provides a very careful, fair and thorough evaluation of the powerful statistical methods, which may be particularly appealing for group-level MRI data analysis.

Purpose

Threshold-free cluster enhancement (TFCE) based on permutation testing outperforms the original cluster-size test (CST) based on random field theory^{1, 2}. Our recent papers provide two voxelation-corrected CST (v-CST and vn-CST) which also show the clear advantage over other CST as well^{3, 4}. However, it’s not clear which one shows better performance for MRI data analysis. This work provides a very careful, fair and thorough evaluation of the powerful statistical methods. To investigate the effectiveness of v-CST, vn-CST and TFCE under different degrees of freedom (dfs), smoothness levels and signal to noise ratios (SNRs) for both stationary and non-stationary images for group-level analysis.

Methods

Simulated null data: The Monte-Carlo simulations used to generate both stationary and non-stationary null data using a strategy similar to that implemented by Li et al. (2015)⁴. For each realization, three sets of two-group data with 64 x 64 x 32 Gaussian images were generated and a two-sample t-test was used to calculate the statistic images with df = 18, 38 and 58, respectively. For the stationary null data simulation, the applied full width at half maximum of Gaussian kernels were 0, 3, 6 and 9 voxels. For the null non-stationary data, each white noise image was smoothed with three different 3D Gaussian kernels, producing three images with low, medium and high smoothness. Six different smoothness settings was used to simulate different levels of non-stationary data. 2000 realizations were generated for each sample size. Each test's rejection rate was calculated by taking the number of realizations that contained detected clusters divided by the total number of realizations.

Simulated activation data: A template of the medial visual resting state network⁵ was used for the ground truth activation spatial pattern. Ground truth signals were assigned a value of 0 for background voxels and a peak value of 1 in network voxels. The signal was scaled by 1, 3 or 5, and added to the unsmoothed simulated non-stationary data to give a range of peak SNR values of 1, 3 and 5, respectively. 20 realizations were generated for each sample size (df = 18, 38 and 58). The smoothness levels were the same as the null data simulation. Receiver-operator characteristic (ROC) curves were used to compare each method's performance with non-stationary activation data.

VBM data: Two group-size structural images: 65 images (small group, 34 normal control (NC) subjects and 31 patients with Alzheimer’s Disease (AD)) and 82 images (larger group, 42 NC subjects and 40 AD) obtained from the ADNI database were used for the VBM analysis. An optimized VBM protocol was implemented using FSL-VBM. Two different smoothing kernels with δ = 3 and 4 mm were applied.

For CST inference, two commonly used cluster defining thresholds (t = 2.5 and 3.5) were applied. For TFCE test, the number of permutations was set to 5000 with the default connectivity. The significance level of tests was set to 0.05.

Results and Discussion

Figs. 1 and 2 show the results of FWE-corrected rejection rates on simulated stationary and non-stationary null data, respectively. The performance of CST methods depend on the intensity threshold. Compared with CST methods, the performance of TFCE is more stable in controlling the false positive rate. With a suitable intensity threshold, the performance of CST inference and TFCE is comparable.

Fig. 3 shows the AUC results on simulated stationary activation data. For high SNR (SNR ≥ 3), TFCE shows slightly better sensitivity under all dfs and smoothness levels. For low SNR (SNR = 1), the performance of CST methods are better than TFCE under low smoothness level (FWHM = 0 voxel) and low df (df = 18); with increasing smoothness level, the performance of TFCE is increased and shows better sensitivity. The results of non-stationary data are similar to stationary data and therefore it is not displayed.

Figs. 4 and 5 show the VBM results using vn-CST and TFCE methods for small and larger group size, respectively. Over the large and small groups, for t = 2.5 the vn-CST results were similar or better than TFCE, while for t = 3.5 the vn-CST results were similar or worse than TFCE.

Conclusion

In summary, both vn-CST and TFCE are robust inference approach for group-level analysis without requiring high degrees of spatial smoothness or uniform smoothness. TFCE is more reliable without requiring the cluster-forming intensity threshold, but it’s not available for individual subject-level because the assumption of exchangeable. Thus the most suitable approach for inference may ultimately depend on whether or not the interest is in single-subject versus group-level analysis.

Acknowledgements

This work was supported by “the Fundamental Research Funds for the Central Universities”.

References

1. Smith, S.M., Nichols, T.E. Threshold-free cluster enhancement: Addressing problems of smoothing, threshold dependence and localisation in cluster inference. NeuroImage. 2009; 44: 83-98.

2. Salimi-Khorshidi, G., Smith, S.M., Nichols, T.E. Adjusting the effect of nonstationarity in cluster-based and TFCE inference. NeuroImage. 2011; 54: 2006-2019.

3. Li, H., Nickerson, L.D., Xiong, J., et al. A high performance 3D cluster-based test of unsmoothed fMRI data. NeuroImage. 2014; 98: 537-546.

4. Li, H., Nickerson, L.D., Zhao, X., et al. A voxelation-corrected non-stationary 3D cluster-size test based on random field theory. NeuroImage. 2015; 118: 676-682.

5. Beckmann, C.F., DeLuca, M., Devlin, J.T., et al. Investigations into resting-state connectivity using independent component analysis. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2005; 360: 1001-1013.

Figures

Fig. 1 Results for the simulated stationary null data. All methods were compared for df = 18, 38 and 58, and FWHM = 0, 3, 6 and 9 voxels. For CST inference, the applied intensity thresholds were 2.5 and 3.5. The desired FWE-corrected p-value was 0.05.

Fig. 2 Results for the simulated non-stationary null data. Six different non-stationarity settings were used. v-CST is not available for non-stationary data [3], thus the results of v-CST for non-stationary data were not displayed.

Fig. 3 AUC results for the simulated stationary activation data. All methods were compared for three sample sizes (df = 18, 38 and 58) and four smoothness levels (FWHM = 0, 3, 6 and 9 voxels). The applied SNRs were 1, 3 and 5.

Fig. 4 VBM results (NC > AD) for small sample size. Rows 1 and 5 show the variation in image smoothness with local smoothness, derived from SPM8. The variation range (in voxel count) of FWHM images are from 0 to 7.5 (row 1) and from 0 to 8.5 (row 5).

Fig. 5 VBM results (NC > AD) for larger sample size. The variation range (in voxel count) of FWHM images are from 0 to 7.5 (row 1) and from 0 to 8.5 (row 5). The smoothness level of VBM data is non-stationary, so v-CST results were not shown.

Proc. Intl. Soc. Mag. Reson. Med. 24 (2016)

1901