Fetal MRI plays a critical role in diagnosing brain abnormalities but routinely suffers from artifacts resulting in nondiagnostic images. We aim to automatically identify nondiagnostic images during acquisition so they can be immediately flagged for reacquisition. As a first step, we trained a neural network to classify T2-weighted single-shot fast spin-echo (HASTE) images as diagnostic or nondiagnostic. With novel data, the average Area Under Receiver Operator Characteristic Curve was 0.84 (σ = 0.04). The neural network learned relevant criteria, identifying high contrast boundaries between areas like cerebral spinal fluid and cortical plate as relevant to determining image quality.
Because of unpredictable and severe fetal motion, 2D single-shot T2-weighted (SST2W) imaging is the dominant acquisition method to mitigate motion artifacts while achieving diagnostic image contrast1. However, fetal motion often occurs during the k-space readouts, resulting in severe artifacts and consequently nondiagnostic images (Figure 1). Motion from slice to slice also causes artifacts in the form of incomplete volume coverage or incomplete signal recovery due to spin history effects. Current fetal MR protocols attempt to address this by repeating each orthogonal stack of 2D slices multiple times, but with no immediate guarantee that the slices are of diagnostic quality.
To address these problems, we aim to train an image quality classifier that automatically identifies nondiagnostic images. This tool can then be used during the scan to flag nondiagnostic images for reacquisition in order to deliver a diagnostic quality stack of 2D slices in a single SST2W scan.
A total of 1874 images were obtained from ten previously acquired clinical scans of mothers with singleton pregnancies, ranging in gestational age between 19-37 weeks. Scans were conducted at Boston Children’s Hospital (BCH) with Institutional Review Board approval. Scans were acquired using the SST2W (HASTE) sequence with TE/TR = 100 ms/1.4 s, FOV 36cm, slice thickness 3 mm, and voxel size of 1.4x1.4x3 mm3.
Under the guidance of radiologists at BCH, a research assistant identified criteria distinguishing nondiagnostic from diagnostic images (Figure 1), and labeled a novel dataset, yielding 142 nondiagnostic images and 1732 diagnostic images.
The brain-masked version of the dataset was produced using a fetal brain segmentation neural network7, with additional post processing to improve the segmentations. Each image was standardized prior to neural network training and evaluation.
The neural network trained on the original dataset scored an average AUROC of 0.79 and the neural network trained on the brain-masked dataset scored an average AUROC of 0.84 on the validation set, each with corresponding standard deviations of 0.04 (Figure 2). The results suggest no performance benefit from training on brain-masked images.
Integrated Gradients on representative results from the original dataset trained neural network (Figures 3, 4) demonstrated that the neural network generally learned relevant criteria for image quality assessment, as would a radiologist. The neural network identified sharp high contrast boundaries, like those separating skull and cerebrospinal fluid (CSF), CSF and cortical plate, cortical plate and white matter, as characteristic of diagnostic quality images. Noise in the white matter and blurring of boundaries between cortical plate, CSF, and white matter, seem to characterize nondiagnostic quality images. Non-relevant features outside the fetal brain, like subcutaneous fat, fetal and maternal body parts, affected the classifier’s decision.
Integrated Gradients on results from the brain-masked trained neural network demonstrated that the neural network also identified high contrast boundaries as characteristic of diagnostic images (Figure 5).
The classifier took 20 ms to run on each image on a single NVIDIA Tesla K80 GPU with 4 vCPUs and 65 GB of RAM.
Our findings suggest that a deep learning approach could be used for image quality assessment of fetal brain MRI. Future work will involve labeling a larger dataset, experimenting with the number of quality categories, and investigating techniques for helping the classifier focus on the brain region of interest.
After achieving desirable performance, the classifier will be integrated into the SST2W acquisition. This integration may need to be semi-supervised with an MR technician reviewing the images flagged for reacquisition. A semi-supervised pipeline could then be used for online learning to further improve the neural network’s performance, with the ultimate goal of having a fully automated classifier.