2417

Implementation of a convolutional neural network for brainstem landmark detection and co-registration

Owen Bleddyn Woodward¹, Ian Driver¹, Michael Germuska¹, and Richard Wise²
¹CUBRIC, Cardiff University, Cardiff, United Kingdom, ²Department of Neuroscience, Imaging and Clinical Science, 'G. D'Annunzio University' of Chieti-Pescara, Chieti, Italy

Synopsis

Keywords: Data Processing, Machine Learning/Artificial Intelligence, Co-registration

Accurate brainstem co-registration is important when analysing brainstem functional MRI data. We trained a convolutional neural network (CNN) to detect a set of brainstem landmarks and to define a brainstem region-of-interest and used these to co-register the brainstem between functional and anatomical space using previously developed landmark-based and automated brainstem co-registration (LBC and ABC) methods. The use of CNNs to supply these features to LBC and ABC produced results that compared well to conventional methods. Similar CNNs could be applied to other brain regions and such methods may be useful to automate the analysis of large functional datasets.

Introduction

Brainstem nuclei are of the order of millimetres in size. Therefore, inaccurate co-registration between functional and structural MRI data will reduce sensitivity to detect significant changes in brain activity in group-level analyses of fMRI data. Furthermore, the brainstem is surrounded by CSF and vascular structures whose signals exhibit high temporal instability such that misregistration of those structures may contaminate fMRI signals of interest from brain parenchyma.

Conventional co-registration methods are often applied to the whole brain by optimising an intensity and/or boundary-based cost function^1-3. Co-registration can be rendered more spatially specific by weighting the cost function towards the brainstem using a brainstem mask⁴. An alternative method is to label the brainstem with a set of anatomical landmarks, and then to translate the input dataset to the reference dataset with the aim of minimising the mean squared error between the two sets of landmarks⁴. These two processes have been shown to improve brainstem co-registration and have been termed ‘automated brainstem co-registration’ (ABC) and ‘landmark-based co-registration’ (LBC) respectively⁴.

Manually defining a brainstem mask for ABC or a set of landmarks for LBC is time-consuming. We developed and trained a convolutional neural network (CNN) to automatically generate a brainstem mask and to automatically detect brainstem landmarks, thus further automating the ABC and LBC methods.

Methods

Firstly, a 2D-CNN (figure 1) was trained to automatically predict the location of four brainstem landmarks on a mid-sagittal slice of high-resolution structural MRI and lower-resolution functional MRI datasets. These landmarks were then used to co-register the functional data to structural space using 3dTagalign, which is part of the AFNI software package⁴. 3dTagalign performs co-registration by rotating and translating the functional dataset and using a least-squares algorithm to minimise the distance between the specified anatomical and functional landmarks.

Secondly, this 2D-CNN was expanded into a 3D-CNN and reconfigured to predict the boundaries of a cuboidal mask encompassing the brainstem and this mask was used to weight an affine co-registration using FSL FLIRT. These methods are henceforth referred to as 2D-CNN LBC and 3D-CNN ABC respectively.

113 anatomical (MPRAGE) and 39 resting-state functional (rsfMRI) EPI datasets were manually labelled with four brainstem landmarks and used to train the landmark-predicting 2D-CNN. The 3D-CNN was trained using 39 anatomical datasets each manually labelled with the boundaries of a cuboidal region encompassing the brainstem. Training data were augmented with random shear, random brightness and random crop techniques. CNN performance was evaluated using 10 EPI and 10 MPRAGE test datasets.

3D-CNN ABC and 2D-CNN LBC were compared to global co-registration using FSL FLIRT and to LBC performed using manually labelled landmarks (ground-truth LBC). The performance of the methods incorporating the CNNs was assessed by calculating the root mean square error (RMSE) between the predicted and manually-defined ground truth landmarks and boundaries. Co-registration accuracy was assessed by evaluating the RMSE between the four ground truth brainstem landmarks on the co-registered functional data and the ground truth anatomical landmarks.

Results

The 2D-CNN was able to accurately predict the four brainstem landmarks on anatomical (RMSE = 2.5 ± 0.8 mm) and functional (RMSE = 4.4 ± 1.4 mm) images (figure 2). The low standard deviation of the RMSE suggests that the CNNs are robust to variations in brain shape, size, and orientation between individuals. The 3D-CNN is slightly less accurate but can predict the boundaries of the brainstem ROI with RMSE = 6.3 ± 3.0 mm (figure 3).

Visual inspection reveals that 2D-CNN LBC and 3D-CNN ABC subjectively outperform affine intensity-based co-registration using FSL FLIRT (figure 4), although the threshold for statistical significance was not reached in this small sample (figure 5). 3D-CNN ABC was very inaccurate in one of the test datasets (outlier on figure 5). Ground truth LBC is the most accurate (Mean RMSE across ten datasets after co-registration = 2.2 ± 0.8 mm), significantly outperforming both affine intensity-based co-registration using FSL FLIRT (mean RMSE_FLIRT = 4.7 ± 1.3 mm, t = 5.39, df = 14.75, p = 8.03x10^-5) and 2D-CNN LBC (mean RMSE_{2D-CNN LBC} = 3.4 ± 1.0 mm, t = 2.97, df = 16.76, p = 0.009).

Discussion

We have demonstrated that CNNs have the potential to improve brainstem co-registration when compared to conventional affine co-registration. 2D-CNN LBC performed well after training with relatively few datasets, which is promising for the development of further CNNs tailored toward other regions of the brain. 3D-CNN ABC performed well in nine out of ten test datasets, but failed on one occasion. 3D-CNN ABC is dependent on the ability of FLIRT to accurately co-register using intensity-based cost-function minimisation. If this process is constrained to a relatively small ROI around the brainstem, there may not always be sufficient intensity-based differences between the datasets to perform accurate co-registration. LBC is potentially more robust because it is constrained by the requirement to minimise the MSE between only four brainstem landmarks. LBC using manually defined landmarks (ground-truth LBC) is most accurate for brainstem co-registration, but the methods developed here may be useful in the analysis of large datasets where manually labelling the brainstem impractical. Experimentation with more complex network architectures, further hyperparameter optimisation and a larger training dataset might further improve CNN performance.

Acknowledgements

Cardiff University Brain Research Imaging Centre

GW4 MRC DTP

References

1. Greve, D., 2009. Accurate and robust brain image alignment using boundary-based registration. Neuroimage, pp. 63-72.
2. Jenkinson, M., 2001. A Global Optimisation Method for Robust Affine Registration of Brain Images. Neuroimage, pp. 63-72.
3. Jenkinson, M., 2002. Improved Optimisation for the Robust and Accurate Linear Registration and Motion Correction of Brain Images. Neuroimage, pp. 825-841.
4. Napadow, V., 2006. Automated Brainstem Co-registration (ABC) for MRI. Neuroimage, pp. 1113-1119.
5. Cox, R., 1996. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res, pp. 162-173.
6. Chen, J., 2021. Face Landmark Detection Pytorch. [Online]
Available at: https://github.com/jerrychen44/face_landmark_detection_pytorch

Figures

Figure 1: Outline of the CNN architecture. The number of output channels in each layer is equal to the number of filters applied to the preceding layer. CNN based on the LeNet architecture⁶, with four convolution layers (each with a ReLu activation function) feeding into a fully connected layer (with a linear activation function), a subsampling max-pooling layer, and an output layer which contained the coordinates of the four brainstem landmarks. The boundary-predicting 3D-CNN employed the same structure as the landmark-predicting 2D-CNN, apart from the conversion from 2D to 3D.

Figure 2: Representative examples of functional (left) and anatomical(right) mid-sagittal slice with manually labelled ground truth landmarks (white) and landmarks predicted by 2D-CNN (red). Top row - before training, bottom row - after training. The brainstem landmarks correspond to the mid-sagittal ventral pontomesencephalic notch, mid-sagittal ventral pontomedullary notch, apex of the fourth ventricle and the mamillary body.

Figure 3: Mid-sagittal and coronal slices through a representative example of anatomical data with boundaries of cuboidal brainstem region of interest. Manually labelled ground truth (blue) and predicted by 3D-CNN (yellow). Top row - before training, bottom row - after training.

Figure 4: Representative example of the result of 2D-CNN LBC (top row), affine registration using FSL FLIRT (second row), ground-truth LBC (third row) and 3D-CNN ABC (bottom row). The outline of the anatomical scan is overlaid in red. Alignment with the posterior edge of the brainstem is clearly inaccurate using FLIRT (white arrow). Brainstem co-registration is subjectively improved using all three of the other methods.

Figure 5: Comparison of RMSE of the four co-registration methods. Left: Co-registration of one of the test datasets in the 3D-CNN ABC group was very inaccurate (see outlying datapoint). Right: Comparison of the methods after exclusion of this outlier. *ground truth LBC significantly outperformed 2D-CNN LBC and affine-FLIRT.

Proc. Intl. Soc. Mag. Reson. Med. 31 (2023)

2417

DOI: https://doi.org/10.58530/2023/2417