3503

Deep Convolutional Neural Network model for sub-Anatomy specific Landmark detection on Brain MRI
Sumit Sharma1 and Srinivasa Rao Kundeti1
1Philips Healthcare, Bangalore, India

Synopsis

A Deep CNN (D-CNN) model for Brain sub-anatomy landmark detection for auto MRI scan planning. We compare D-CNN with traditional approaches like segmentation followed by image processing (AL-Net). D-CNN shows better landmark detection with Average RMSE <= 6mm (N=100) compared to AL-Net.

Synopsis

We present a Deep Convolutional neural network (D-CNN) model for brain landmark detection, an important step in planning MRI scans for two anatomies - Corpus callosum and Cerebellum on 3D T1w images. Unlike traditional approaches like segmentation followed by image processing to identify the brain landmark points (AL-Net), we use a D-CNN to directly predict the landmarks. For all the landmarks, we demonstrate that D-CNN can achieve landmark detection with Average Root mean square error(RMSE) <= 6mm (N=100) for all landmarks and is comparable or better than AL-NET (RMSE <=8mm). Results indicate excellent feasibility of the method for clinical usage.

Introduction

Landmark detection is used in several tasks, for example- facial landmark detection for facial analysis tasks 1- emotion recognition, head pose estimation, etc. MRI landmark detection plays an important role in various MRI clinical workflows and MR image analysis – scan planning 2, image registration 3 etc. There are several ways to define landmarks: 3 i. key point based (corner, edge etc.), ii. Atlas based/anatomy specific. In this work, we have focused on the anatomy specific landmark definition. For medical imaging segmentation, there are plenty of public datasets available 4,5 but there is a huge scarcity of landmark datasets in medical imaging. Also, the proposed D-CNN approach has lesser annotation effort, avoiding anatomy annotations compared to AL-Net and can be easily extended for other anatomies (for example: knee, liver etc.) and applications (like registration, etc.) in future.

Materials and Methods

Data: For corpus callosum and cerebellum segmentation data, 2-D slices are extracted from abide-cc 4 OASIS 5 datasets. For the landmark detection dataset, first Landmark definition was created in-house (number of landmarks: 10- axial view, 7 – sagittal view, 5- coronal view) followed by in-house annotation by a Radiologist. Corpus callosum related landmarks are annotated in the sagittal view and cerebellum in the coronal view. 500 slices in each sagittal, coronal and axial view containing relevant landmarks are annotated. As shown in fig. 1, to predict the final output, there are two approaches- AL-Net and D-CNN. AL-Net is using the anatomical model output (segmentation) followed by conventional image processing approaches for landmark extraction. The D-CNN model directly predicts landmarks. The Components of fig. 1 are described below.

Preprocessing: Intensity normalization, data augmentation, synthetic data synthesis are used to make the network robust to MR scanner and patient variations. Training and experimental setup: The dataset is split into three parts- 60% for training, 20% for validation and 20% for test. A Test set is used to compare both AL-Net and D-CNN.

AL-Net :
1.1. Sub-anatomy predictor (A-Net): Deep CNN network segments the anatomies in the brain region.
1.2. Landmark Extractor: It uses the output of the sub-anatomy predictor network as input and extracts landmarks using conventional image processing (detecting extreme right, extreme left, highest point etc.) for the landmark based methods. It is implemented for corpus callosum and cerebellum points only.
D-CNN:
2.1. Landmark predictor:
D-CNN model uses deep CNNs for landmark detection Training: Custom deep CNN is initialized with pre-trained weights is trained separately for three different views – sagittal, coronal and axial with the input image size as (256x256) , model output as (kx64x64) where k is the number of landmarks.

Results

A-Net in AL-Net gives high segmentation accuracy with dice coefficient > 0.92 (slice wise) for corpus-callosum and cerebellum. Results shown in table 1 indicate that D-CNN can achieve landmark detection with Average Root mean square error(RMSE) <= 6mm (N=100) for all landmarks and is giving comparable/ better results than AL-NET (RMSE <=8mm). Also a standard metric, the PCKh (head-normalized probability of correct keypoint) score is calculated for D-CNN and is shown in Table 2, (PCKh@0.4 > 99% for all three views). Figure 2. shows the prediction of D-CNN and AL-Net compared to the ground truth. RMSE boxplot for all landmarks for D-CNN and AL-NET is shown in Figure 3. RMSE value for almost all landmarks in D-CNN is less than 10 which confirms the robustness of D-CNN model. Also accurate landmark estimation after getting A-Net segmented output is a huge task in itself, requires many image processing steps which makes it difficult to scale as number of landmarks increase.

Conclusion

Result for AL-Net greatly depends on the accuracy of the segmentation model and the algorithm extracting landmarks from the corresponding segmentation. Also variation in the ground truth dataset creation for both segmentation and landmark cases is an important factor affecting the accuracies requiring more robust landmark definition. D-CNN is more generic, reduces the pipeline changes and can be extended to other anatomies and regions faster as compared to segmentation based approaches and also. In future, D-CNN output heatmaps need not be downsampled, to increase the output and number of parameters in the network, which can further improve accuracy.

Acknowledgements

We would like to thank Vineeth VS, clinical specialist, PIC Bangalore for providing landmark definition for brain data annotation.

References

1. Wu, Yue, and Qiang Ji. "Facial landmark detection: A literature survey." International Journal of Computer Vision 127.2 (2019): 115-142.

2. A generalized deep learning framework for multi-landmark intelligent slice placement using standard tri-planar 2D localizers , ISMRM 2019

3. Zhang, Jun, Mingxia Liu, and Dinggang Shen. Detecting anatomical landmarks from limited medical imaging data using two-stage task-oriented deep neural networks. IEEE Transactions on Image Processing 26.10 (2017): 4753-4764

4. Kucharsky Hiess, R., Alter, R.A., Sojoudi, S., Ardekani, B., Kuzniecky, R., and Pardoe, H.R. (2015) Corpus callosum area and brain volume in autism spectrum disorder: quantitative analysis of structural MRI from the ABIDE database, Journal of Autism and Developmental Disorders, 45(10): 3107-3114, doi: 10.1007/s10803-015-2468-8

5. Open Access Series of Imaging Studies (OASIS), online: http://www.oasis- brains.org

6. Sun, K., Xiao, B., Liu, D. and Wang, J., 2019. Deep high-resolution representation learning for human pose estimation. arXiv preprint arXiv:1902.09212

7. M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In CVPR, pages 3686–3693, 2014

Figures

Table 1: RMSE comparison for D-CNN and AL-Net. (a) shows RMSE for D-CNN (b) shows comparison of RMSE for D-CNN and AL-Net

Table 2: shows the PCKh for D-CNN model

Figure1. Overall flow used for Landmark detection. Blue color text denotes the output of the AI models.

Figure2. Row 1 gives the landmark definition in axial, coronal and sagittal view respectively. Row2 compares the output (Red dots) of (A2) versus ground truth (Green dots).

Figure3. Results showing the box plot for Root Mean square error in pixel units for AL-Net and D-CNN

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)
3503