Varying acquisition and reconstruction conditions as well as long examination times make MRI susceptible to various kinds of artifacts. If suitable correction techniques are not available/applicable, if human experts who judge the achieved quality are not present or for epidemiological cohort studies in which a manual quality analysis of the large database is impracticable, an automated detection and identification of these artifacts is of interest. Convolutional neural networks with residual and inception layers localize and identify occurring artifacts. Artifacts (motion and field inhomogeneity) can be precisely identified with an accuracy of 92% in a whole-body setting with varying contrasts.
Depending on the chosen sequence type and contrast weighting, MRI is more or less susceptible to several types of image artifacts. In order to guarantee high data quality, arising artifacts need to be detected as early as possible to seize appropriate countermeasures. Due to the manifold of possible occurring artifacts not all precautions can be considered, hence leaving a chance of artifacts being present in the final image. It is the task of a human MR specialist to appreciate the level of achieved image quality with respect to the underlying application. This analysis can be a time-demanding and cost-intensive process. Insufficient image quality may demand an additional examination decreasing patient comfort and throughput. Thus, a prospective quality assurance is highly desired.
In the context of large epidemiological cohort studies such as UK Biobank1 or German National Cohort2, reliable image quality has to be guaranteed. However, the amount and complexity exceed practicability for a manual analysis. Thus, a retrospective quality assessment/control is desired.
An automated and reference-free quality analysis is preferred. Previously proposed approaches for automated image quality analysis required the existence of a reference image or were only focused on specific scenarios3-5. Reference-free approaches6-10 are mainly metric-based driven to evaluate quality on a coarse level.
In a previous work, we showed the potential of a deep learning network to perform the task of automatic reference-free motion artifact detection11,12. In this work, we extend this concept to a multi-class scenario for identification of motion and magnetic field inhomogeneity artifacts in a whole-body scenario and in images exhibiting different contrast weightings. We propose two convolutional neural network architectures for this task and investigate their performance.
MR images were acquired on a 3T PET/MR (Biograph mMR, Siemens) from 18 healthy volunteers (3 female, 25±8 years) with a T1w and T2w FSE sequence. The acquisition parameters for the respective body regions (head, abdomen, pelvis) are depicted in Tab.1. In each body region and contrast two acquisitions were performed: a reference and a motion-corrupted (head, hips movement and breathing) scan. For the T2w sequence, magnetic field inhomogeneity artifacts were acquired with manually disturbed B0 shimming. All images are normalized into an intensity range of 0 to 1 and partitioned into 50% overlapping patches of size 80x80 (APxLR).
The proposed convolutional neural network (CNN) architectures are depicted in Fig.2. The DenseResNet is inspired by combining the ideas of DenseNet13 and ResNet14. It consists of five stages. A first convolutional stage, followed by three residual stages and a final fully connected output stage. The dense connections act as a bypass of feature maps from previous layers to deep layers enabling a joint estimation on coarse- and fine-grained feature maps in deeper layers. Residual shortcuts feed-forward feature maps to provide residual mapping and to enable ensemble learning. The residual blocks are built up of a 1x1 convolutional layer wrapped around the 3x3 convolutional layer acting as a bottleneck structure to merge feature maps by subtraction across channels.
The InceptionResNet is inspired by the GoogLeNet15 which uses inception layers. The idea is to cover larger spatial areas by multi-scale convolutions and keeping smaller image portions for deeper levels. The architecture consists of three convolutional stages with inception modules on the second and third stage. In the first and second stage an additional residual path forwards the feature map to a deeper level.
Both architectures are trained by a leave-one-subject-out cross-validation to output probability values $$$p_i$$$ for each of the twelve classes (see Fig.3). Categorical cross-entropy is minimized for given learning rate, $$$\ell_2$$$ regularization and dropout. Parameter ranges are estimated by the Baum-Haussler rule16 with a grid-search optimization. Testing was performed on the left-out subject to investigate accuracy and confusion matrices.