Manual MRI quality assessment is time-consuming, subjective, and error-prone. We show that image quality of contrast-varying pediatric MR images can be automatically assessed using deep learning with near-human accuracy.
A. Data Preparation: T1- and T2-weighted MR volumes of pediatric subjects from birth to age six were annotated manually by an experienced neuroradiologist based on three labels: pass, questionable, and fail (see Table 1). 17600 sagittal slices and 8800 axial slices were extracted from 176 T1-weighted volumes; 25400 sagittal slices and 12700 axial slices were extracted from 254 T2-weighted volumes. Each slice is labeled based on the volume it belongs to. The T1/T2 slice sets are divided into training, validation and testing subsets with ratio 8:1:1. Each slice is uniformly padded to 256×256 and the intensity is min-max normalized.
B. Network Architecture: Our nonlocal residual neural network, shown in Figure 1, incorporates (1) depthwise separable convolution, which is computationally efficient with good feature extraction capability2, and (2) nonlocal blocks, which capture long-range dependencies between features extracted at any two positions, regardless of their positional distance3. Our network consists of two convolution (Conv) blocks, two depthwise separable residual (DSRes) blocks, one nonlocal residual (NRes) block, and one classifier block. The Conv and DSRes blocks extract low- and high-level features, respectively. The NRes block computes the response at each position as a weighted summation of features at all positions in the feature maps. The classifier block (realized with a convolutional layer, global average pooling, and softmax activation function) outputs three probability values indicating whether a slice is “pass”, “questionable”, or “fail”. The slice is labeled based on the highest value among the three probability values.
C. Training and Testing: In the training stage, we initially assumed that each slice can be labelled based on its corresponding volume. However, this assumption is not always correct, as the artifacts may only affect a few slices in a volume and the unaffected slices are hence incorrectly labelled. To deal with noisy labels, we propose to iteratively train the network with a relabeling and pruning strategy. Specifically, we obtain an initial prediction of the labels of all training slices and select slices satisfying the following conditions to retrain the network: (1) Slices with predicted labels identical to initial labels; (2) Slices with high certainty (i.e., with probability threshold 0.7) as belonging to either “pass”, “questionable”, or “fail”. The training samples were pruned by removing slices that do not meet these two criteria. We employed a multi-class balanced focal loss4 to alleviate the data imbalance issue caused by the relabeling process. In the testing stage, we used the trained model to predict the quality of each image slice in the testing dataset. Then, the quality of each volume is determined using the following rules: “pass” if more than 80 percent of the slices in the volume are labeled as “pass”; “fail” if more slices are labeled as “fail” than “pass” or “questionable”; “questionable” if otherwise.
The confusion matrix, together with the sensitivity and specificity of the quality assessment results, for the testing T1- and T2-weighted images are presented in Table 2. It can be observed that the specificity of the “pass” images is 1, indicating that no “questionable” and “fail” images are mistakenly labelled as “pass”. Figure 2 shows some examples of T1- and T2-weighted images for each category, indicating that “pass”, “questionable”, and “fail” correspond respectively to no/minor, moderate, and heavy image degradation. Figure 3 shows detailed testing IQA results of slices and volumes. We can observe that the highest probabilities of slices are high, indicating that our method can assess each slice reliably. Moreover, the volume IQA results match the ground-truth IQA.
1. Zhuo J, and Gullapalli R P. MR Artifacts, safety, and quality control. RadioGraphics. 2006; 26(1):275-297.
2. Chollet F. Xception: Deep learning with depthwise separable convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA; 2017.
3. Wang X, Girshick R, Gupta A, and He K. Non-local neural networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA; 2018.
4. Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. IEEE International Conference on Computer Vision (ICCV). Venice, Italy; 2017.