2096

A-Eye: quality control and deep learning segmentation of the complete eye in MRI

Jaime Barranco^1,2,3, Hamza Kebiri^1,2,3, Óscar Esteban², Raphael Sznitman⁴, Sönke Langner^5,6, Oliver Stachs⁷, Adrian Konstantin Luyken⁷, Philipp Stachs⁸, Benedetta Franceschiello^2,3,9,10,11, and Meritxell Bach Cuadra^3,11
¹Center for Biomedical Imaging (CIBM), Lausanne, Switzerland, ²Lausanne University Hospital (CHUV ), Lausanne, Switzerland, ³University of Lausanne (UNIL), Lausanne, Switzerland, ⁴ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland, ⁵Institute for Diagnostic and Interventional Radiology, Pediatric and Neuroradiology, Rostock University Medical Center, Rostock, Germany, ⁶Department of Diagnostic Radiology and Neuroradiology, University of Greifswald, Greifswald, Germany, ⁷Department of Ophthalmology, Rostock University Medical Center, Rostock, Germany, ⁸Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany, ⁹HES-SO Valais-Wallis, Sion, Switzerland, ¹⁰The Sense Innovation and Research Center, Sion and Lausanne, Switzerland, ¹¹These authors provided equal last-authorship contribution, Lausanne, Switzerland

Synopsis

Keywords: Analysis/Processing, Segmentation, Quality Assessment and Control, Eye, MREye, Ophthalmology, Ocular

Motivation: Reliable large-scale MREye segmentation.

Goal(s): Quality control of eye MRI and deep learning segmentation validation.

Approach: We automatically extract Image Quality Metrics (IQMs) and use them as features to train a model in a supervised framework with expert rating annotations as target. Multi-class 3D MREye segmentation is done for the first time using the deep-learning-based approach nnUNet.

Results: None of the models achieved the required levels of sensitivity and specificity necessary for our MREye application. nnUNet for MREye segmentation tasks yielded promising outcomes, robust to a variety of MRI quality.

Impact: MREye does not escape the evidence that insufficient data quality threatens the reliability of analysis outcomes. We pioneer manual and automated quality control on MREye and benchmark deep learning eye segmentation.

Introduction

MRI of the human eye (MREye) is gaining interest due to its comprehensive 3D anatomical view¹. Our previous work^2,3,4 introduced A-eye, an automated atlas-based segmentation technique for eye structures in T1w MRI data, aiming for large-scale, reliable segmentation.
Because low-quality data can introduce biases in the results and lead to erroneous conclusions, it is critical to establish a quality assessment and control (QA/QC) protocol. However, setting exclusion criteria is challenging due to variability across applications and researchers. Automated QC protocols, such as MRIQC⁵, help in the early identification of subpar images by automatically computing several image quality metrics (IQMs), and generate standardized visual reports for manual assessment of different quality-related aspects.
While these techniques have been extensively studied and implemented for the adult brain⁶, their application to the eyes remains unexplored, leaving a gap in understanding effective QA/QC protocols for MREye.
Our work contributes a deep learning segmentation model compared with manual annotations and previous techniques, and a QA/QC protocol within the A-Eye pipeline. While advocating for both pre- and post-segmentation QA/QC checkpoints, as depicted in Figure 1, this work focuses on the QC of pre-segmented images.
We believe having a tailored MREye QC would significantly enhance the integrity, reliability, and clinical applicability of the segmented large-scale data, rendering it an essential component of MREye analysis workflows.

Methods

Data
1. SHIP (Study of Health in Pomerania, Germany) dataset^7,8: 1245 T1w from a 1.5T Magnetom Avanto with manual annotations on 68 subjects of: lens, globe, optic nerve, intraconal and extraconal fats, and rectus muscles.
2. MRIQC datasets⁵: ABIDE, comprehending 1102 T1w scans from 17 sites (19 scanners); and DS030, with 265 images from 2 sites, selected for their heterogeneity.
MREye segmentation method
A deep learning-based approach (nnUNet⁹) was trained on the manually annotated SHIP dataset for 3D MREye segmentation, splitting it into 31 subjects for training (reserving 4 for validation) and 37 were held-out for evaluation.
Manual quality control
Subjective eye-quality assessment was performed on 183 SHIP subjects using adapted MRIQC reports, including the change of the field of view for the thumbnails, and eye-oriented aspects such as open/close, (see Figure 2). Two expert raters assessed 83 subjects with scores from 0 (exclude) to 4 (excellent), which were then averaged and normalized to get binary scoring (exclude/include). Another rater rated additional 100 subjects directly with binary scoring.
Automated quality control
Five eye QC strategies were explored for automatic exclusion/inclusion classification:

Baseline. Retrained MRIQC classifier using ABIDE for training and DS030 for testing, with updated scikit-learn and numpy python libraries.
Adapted model. Retrained baseline model using binary-rated SHIP subset (N=183).
Non-brain model. Implemented previous methods, filtering out brain-related IQMs (counting on 10 out of 68).
Custom non-brain model. Trained custom classifier using the SHIP subset (N=183), 80-20% as train-test split, omitting brain-based IQMs.

Results

nnUNet for MREye segmentation tasks yielded promising outcomes. Compared to manual annotations for 37 subjects, it achieved a median DSC of 0.82 across 9 structures, outperforming the ATLAS-based method’s 0.68 (see Figure 3), with slightly greater challenges encountered in more variable areas like the extraconal fat. nnUNet also performed well in low-quality images, see Figure 4. Let us note that only subjects with included quality were manually segmented, hence the DSC is always computed within that subset of subjects.
A significant gap remains between automated and manual QC decisions. After reviewing results on 426 subjects, the non-brain baseline model achieved the highest overlap with only 30% agreement with human raters in image exclusions. Indeed, automatic models successfully detected overall bad quality. However, the local quality of the eyes was deemed sufficient by the human raters given the application in a substantial number of those exclusions. This is well illustrated in examples in Figure 5.

Discussion

Deep learning segmentation of eye structures surpassed atlas-based methods in 37 subjects. Segmentation performance wasn’t linked to manual quality assessment. Automated global brain- or background-based quality control didn’t meet the needs of our eye segmentation application. Our findings emphasize the need for QA/QC protocols tailored to MREye, including both eye-specific and non-tissue metrics.

Conclusion

To accurately evaluate eye quality in MRI, it's imperative to develop novel IQMs specifically tailored to eye tissues. Additionally, incorporating non-brain related IQMs and extending scrutiny to the periorbital region is crucial.

Acknowledgements

This work was supported by the Gelbert Foundation, the Swiss National Science Foundation (project 205321-182602). We acknowledge the CIBM Center for Biomedical Imaging, a Swiss research center of excellence founded and supported by CHUV, UNIL, EPFL, UNIGE, HUG and the Leenaards and Jeantet Foundations.

References

T. Niendorf, J.-W. M. Beenakker, S. Langner, K. Erb-Eigner, M. Bach Cuadra, E. Beller, J. M. Millward, T. M. Niendorf, O. Stachs, Ophthalmic magnetic resonance imaging: where are we (heading to)?, Current Eye Research (2021) 1–20.
Barranco J., Kebiri H., Esteban O., Sznitman R., Stachs O., Stachs P., Langner S., Franceschiello B., Bach Cuadra M., A-Eye: Towards a large-scale MRI-based model of the complete eye, ISMRM abstract (2022).
Barranco J., Kebiri H., Esteban O., Sznitman R., Stachs O., Stachs P., Langner S., Franceschiello B., Bach Cuadra M., A-Eye: Towards large-scale MRI automated segmentation of the eye, ARVO Imaging Conference (2023).
Barranco J., Kebiri H., Esteban O., Sznitman R., Stachs O., Stachs P., Langner S., Franceschiello B., Bach Cuadra M., A-Eye: Large-scale MRI automatic biomarkers extraction, ARVO Conference (2023).
Esteban O, Birman D, Schaer M, Koyejo OO, Poldrack RA, Gorgolewski KJ; MRIQC: Advancing the Automatic Prediction of Image Quality in MRI from Unseen Sites; PLOS ONE 12(9):e0184661; doi:10.1371/journal.pone.0184661.Documentation: https://mriqc.readthedocs.io/en/latest/about.htmlGithub: https://github.com/nipreps/mriqcmriqc-learn github: https://github.com/nipreps/mriqc-learn
Provins, Céline, et al. ‘Quality Control in Functional MRI Studies with MRIQC and fMRIPrep’. Frontiers in Neuroimaging, vol. 1, 2023. Frontiers, https://www.frontiersin.org/articles/10.3389/fnimg.2022.1073734.
P. Schmidt, R. Kempin, S. Langner, A. Beule, S. Kindler, T. Koppe, H. Vo ̈lzke, T. Ittermann, C. Jürgens, F. Tost, Association of anthropometric markers with globe position: A population-based MRI study, PloS one 14 (2019) e0211817.
Völzke, H., Alte, D., Schmidt, C. O., Radke, D., Lorbeer, R., Friedrich, N., et al. (2011). Cohort Profile: The Study of Health in Pomerania. Int. J. Epidemiol. 40, 294–307. doi: 10.1093/ije/dyp394.
Isensee F., Jaeger P. F., Kohl S. A. A., Petersen J., and Maier-Hein K. H., nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation, 2020. Github: https://github.com/MIC-DKFZ/nnUNet

Figures

Quality control integration within a simplified scheme of the A-Eye project’s pipeline. Initially, a classifier filters subjects based on eye quality before segmentation. Following segmentation and biomarkers extraction, a second quality check assesses those biomarkers. For instance, abnormal axial length could signal either algorithmic or input image quality issues. This dual-stage approach ensures robust results. We focus on the Pre-segmentation QC in this work.

Example of MREye QC report with rating widget. To assess the quality of the eyes of the MR images, we created an html-based report for each of them: a series of axial slices centered and cropped on the right eye. The rating widget on the right is composed by several sliders regarding overall quality [0-4], blur, noise, motion, and background artifacts. Also, it includes two toggle buttons for bias field and eyes closed/open and a text box for further comments. Additionally, it’s possible to select specific slices where heavy artifacts are present (red squares will appear).

nnUNet segmentation method performance. a) Similarity metrics (Dice, Hausdorff distance and Volume difference) computed between ground truth (manually annotated subjects) and nnUNet segmentations per eye structure for N=37 subjects. b) Dice comparison per structure (and averaged across structures) between nnUNet and ATLAS-based segmentation method for N=37 subjects.

Subjective ratings and DSC agreement on N=37 subjects. The plot with averaged DSC indicates that regardless of the eye-oriented subjectively rated image quality, the segmentation algorithm performs commendably in terms of similarity with manually annotated ground truth subjects. We also present the scatter plots per structure, highlighting lower agreement in the fats, especially the extraconal, likely due to their variability in terms of shape and size.

Examples of subjects excluded by the model but with included quality for the eyes.The model excluded subjects based on brain motion and (ringing and other) artifacts, even though eye quality was good, having good segmentation results. This suggests the tool's sensitivity to brain issues.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

2096

DOI: https://doi.org/10.58530/2024/2096