4289

On false positive control in Fixel-Based Analysis

Robert Smith^1,2, Daan Christiaens^3,4, Ben Jeurissen⁵, Maximillian Pietsch³, David Vaughan^1,2,6, Graeme Jackson^1,2,6, and J-Donald Tournier³
¹Florey Institute of Neuroscience & Mental Health, Melbourne, Australia, ²Florey Department of Neuroscience and Mental Health, The University of Melbourne, Melbourne, Australia, ³School of Biomedical Engineering & Imaging Sciences, King's College London, London, United Kingdom, ⁴Department of Electrical Engineering, KU Leuven, Leuven, Belgium, ⁵Department of Physics, University of Antwerp, Antwerp, Belgium, ⁶Department of Neurology, Austin Health, Melbourne, Australia

Synopsis

While the Fixel-Based Analysis (FBA) framework provides familywise error control across a whole-brain template accounting for the presence of crossing fibres in the white matter, its typical usage fails to correct for multiple hypothesis tests due to the utilisation of multiple quantitative metrics. We demonstrate different methods that can be employed to provide more comprehensive false positive control in this context.

Introduction

Fixel-Based Analysis (FBA) enables whole-brain statistical inference of white matter quantitative measures in a manner tailored to the complex fibre geometry of the brain white matter¹. Familywise error (FWE) corrected p-values are provided using nonparametric permutation testing². This multiple comparison correction however occurs only across fixels within the brain template ("weak" FWE), not across multiple hypothesis tests ("strong" FWE)³. This may be considered especially problematic for FBA given that within the recommended pipeline, there are three separate quantitative measures tested¹. The likelihood of reporting at least one false positive across the experiment is therefore elevated above the intended 5% level even in the most basic experimental case. Here, we evaluate options for providing holistic experimental false positive control specifically in the context of FBA.

Methods

Within the recommended FBA pipeline, there are three quantitative measures tested for statistical significance¹ for any given effect of interest:

Fibre Density (FD), reflecting microscopic intra-axonal volume;
Fibre bundle Cross-section (FC), reflecting macroscopic morphological changes orthogonal to bundle orientation;
Fibre Density and bundle Cross-section (FDC), reflecting the overall capacity of the WM to relay information.

These parameters are not independent: not only is FDC defined as the product of FD and FC, but FD and FC can covary in some scenarios¹.
We consider the following multiple hypothesis correction strategies within the context of a basic unpaired t-test (e.g., a group comparison between patients and controls), testing for significant group differences in either direction (see Figure 1 for contrasts in attributes of these methods):

Historical advice for FBA, i.e. independent testing of each variable for each direction, with independent false positive control not corrected across hypotheses;
Bonferroni correction across both three metrics and two directions of effect (revise thresholding to p < 0.05/6);
Non-parametric strong familywise error control by using a single null distribution across all three metrics (with requisite restricted exchangeability and heteroscedasticity considerations²) and both directions of effect³;
Statistical inference performed on FDC only, with strong FWE control across the two directions of testing³, with post hoc quantification of relative contributions from FD and FC toward the result⁴;
F-test² across FD and (log-transformed¹) FC (with requisite restricted exchangeability and heteroscedasticity considerations²), with post hoc investigation of both responsible metrics and directions of effects.

Results

Figure 2 demonstrates the implications of a priori choice of multiple comparison control method on a previously published Temporal Lobe Epilepsy (TLE) cohort⁵ (results for TLE < control shown only). Methods 2-5 all exhibit reduced spatial extent of statistical significance compared to Method 1, owing to the more rigorous false positive control mechanisms employed. Methods 4 and 5 yield results for which the extent of statistical significance is maximal under strong FWE control, though notably results do vary considerably between the two methods (Figure 3).

Discussion

While it is possible to condense a basic group comparison FBA to a single hypothesis test (per direction in the case of Method 4)—with post hoc interrogation of contributions toward statistically significant results—both methods suggested for doing so present complications:

Method 4 intrinsically mixes sources of variance from FD and FC in the calculation of FDC, and may therefore suffer from reduced sensitivity to genuine effects present in one metric if, within a particular dataset, the other metric contributes substantial variance;
Method 5 is not directed in group effect nor does it require contributions from FD and FC to be in the same direction. In cases where FD and FC exhibit effects in opposing directions, Method 5 may uniquely yield a statistically significant result, reflecting "a change in the white matter" (i.e. F-test) that would not necessarily affect "the white matter's ability to relay information" (i.e. the FDC metric).

An added benefit of Methods 4 and 5 is the mitigation of risk of misleading interpretation regarding contributions from microstructural vs. morphological effects (Figure 4). If such interpretation is done based on the presence or absence of fixels when p<0.05 thresholds are applied independently to each metric (as is the current recommendation for FBA, i.e. Method 1), the manifestation of multiple thresholding effects (both presence / absence, and relative spatial extents, of bundles across metrics) limits the precision of interpretation. We suggest that evaluating such fractional contributions post hoc⁴ gives more intuitive access to the relative microstructural vs. morphological contributions.

Conclusion

Good scientific practice requires holistic experimental false positive control. We have shown that in the context of FBA there are two viable alternatives to address the conflict between this expectation and the presence of three standard FBA quantitative metrics; moreover, utilisation of one of these methods improves not only robustness against experimental false positives but also the interpretability of results. While we tentatively suggest use of Method 4 with FBA ongoing due to its implication of a greater total number of WM bundles in this instance (despite Method 5 implicating a greater volume of the ipsilateral temporal lobe), further validation on a wider range of retrospective FBA cohorts is necessary to ensure generality of this recommendation.

Acknowledgements

RS, DV and GJ are grateful to the National Health and Medical Research Council (NHMRC) of Australia, and the Victorian Government's Operational Infrastructure Support Program for their support.

RS is supported by fellowship funding from the National Imaging Facility (NIF), an Australian Government National Collaborative Research Infrastructure Strategy (NCRIS) capability.

DC is supported by the Flemish Research Foundation (FWO; grant number 12ZV420N).

MP is funded in part by the Bill & Melinda Gates Foundation (INV-005774).

JDT is supported by ERC grant agreement no. 319456 (dHCP project), by core funding from the Wellcome/EPSRC Centre for Medical Engineering [WT203148/Z/16/Z] and by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London and/or the NIHR Clinical Research Facility. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.

References

Raffelt, David A., J.-Donald Tournier, Robert E. Smith, David N. Vaughan, Graeme Jackson, Gerard R. Ridgway, and Alan Connelly. “Investigating White Matter Fibre Density and Morphology Using Fixel-Based Analysis.” NeuroImage 144 (January 2017): 58–73.
Winkler, Anderson M., Gerard R. Ridgway, Matthew A. Webster, Stephen M. Smith, and Thomas E. Nichols. “Permutation Inference for the General Linear Model.” NeuroImage 92 (May 2014): 381–397.
Alberton, Bianca A. V., Thomas E. Nichols, Humberto R. Gamba, and Anderson M. Winkler. “Multiple Testing Correction over Contrasts for Brain Imaging.” NeuroImage 216 (August 1, 2020): 116760. https://doi.org/10.1016/j.neuroimage.2020.116760.
Arun, Arush Honnedevasthana, Alan Connelly, Fernando Calamante, and Robert E. Smith. “Characterization of White Matter Asymmetries in the Healthy Human Brain Using Diffusion MRI Fixel-Based Analysis.” NeuroImage, November 2, 2020, 117505. https://doi.org/10.1016/j.neuroimage.2020.117505.
Vaughan, David N., David Raffelt, Evan Curwood, Meng-Han Tsai, Jacques-Donald Tournier, Alan Connelly, and Graeme D. Jackson. “Tract-Specific Atrophy in Focal Epilepsy: Disease, Genetics, or Seizures?” Annals of Neurology 81, no. 2 (February 2017): 240–50. https://doi.org/10.1002/ana.24848.

Figures

Figure 1. Attributes of different methods for false positive control in Fixel-Based Analysis (FBA). “Mixed sources of variance” here refers specifically to the fact that in calculation of FDC as the product of FD and FC, the variances of those two variables mix to form the variance of FDC. * Due to non-independence of FDC from FD and FC.

Figure 2. Results of statistical inference for exemplar Temporal Lobe Epilepsy cohort for the five described false positive control methods. Projections show all statistically significant fixels within template volume. For methods 1-3, colours of significant fixels correspond to orientation (red: left-right; green: anterior-posterior; blue: inferior-superior); for methods 4 and 5, colours of significant fixels correspond to value of parameter α, which encodes the proportion of total observed effect contributed by microstructure (i.e. FD) rather than morphology (i.e. FC).

Figure 3. Comparison between results of statistical inference when using Method 4 (TLE < Control result only shown) and Method 5. Statistically significant fixels only within exemplar slices shown; coloured according to value of parameter α, which encodes the proportion of total observed effect contributed by microstructure (i.e. FD) rather than morphology (i.e. FC). Arrows indicate bundles identified as statistically significant only when using one of the two methods.

Figure 4. Possible outcomes of statistical inference, and their corresponding interpretations, for the five false positive control methods under consideration, when accepting or rejecting the null hypothesis (H0) for comparison between groups G1 and G2. Note that for Methods 1-3, for all results other than no effect, an equal interpretation for the opposite group effect is also possible, but these are omitted for brevity.

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

4289