1439

Real-time large-scale anatomical landmark detection with limited medical images

Jun Zhang¹, Mingxia Liu¹, and Dinggang Shen¹

¹Radiology and BRIC, UNC at Chapel Hill, Chapel hill, NC, United States

Synopsis

Landmark detection based on deep neural networks has achieved state-of-the-art performance in natural image analysis. However, it is challenging to detect anatomical landmarks from medical images, due to limited data. Here, we propose a real-time large-scale landmark detection method with limited training data. We train our model with image patches and test it with the entire image, inspired by fully convolutional networks. Also, we develop a weighted loss function in our model to increase the correlations between image patches and their nearby landmarks. The experimental results of detecting 1741 landmarks from brain MR images demonstrate the effectiveness of our method.

Purpose

Our goal is to propose a real-time landmark detector for large-scale anatomical landmark detection with limited medical images. Due to the limited training medical data, it is very challenging to detect anatomical landmarks using the prevalent deep convolutional neural networks (CNN)¹. Also, the number of network weights to be learned in CNN on 3D volumetric images is much larger than that on 2D images. Therefore, it is difficult to train an accurate landmark detection model for medical data in an end-to-end way, where an entire 3D image is treated as input. To avoid the problem of limited training data, conventional methods usually adopt local image patches as samples. However, there are two major problems for landmark detection in the convolutional patch-based way. 1) Although neural network has high efficiency for testing, it is still time-consuming to predict targets for tens of thousands image patches. 2) Large-scale landmark detection aggravates the computational cost if each landmark is detected separately. On the other hand, even if we jointly detect large-scale landmarks, the local patch, which only captures limited local structural information, is quite incompetent to estimate all landmarks, especially for those landmarks far away from the local patch.

Methods

To address these problems, we propose a real-time landmark detector for large-scale landmark detection with limited 3D medical images. Specifically, in the convolutional regression based landmark detector, the nonlinear relationship between local patch appearance and its 3D displacements (see Fig. 1 (a)) to multiple target landmarks can be described by a regression model (e.g., random forest regression, SVM regression, or CNN regression)². In the training stage, we follow the patch-based way to train a regression model based on deep convolutional neural networks (CNN), since there are tremendous patches available for training. In this way, each local patch is capable of estimating the positions of multiple landmarks jointly. Moreover, we also propose to use a weighted mean square error as the loss function, i.e., assigning lower weights for the displacements of faraway landmarks. Therefore, patches are expected to contribute more to their nearby landmarks, thus helping reduce the instability between patches and their faraway landmarks. Generally, we can obtain the optimized landmark positions by assembling the predictions of tremendous image patches via a weighted majority voting strategy².

Different from the conventional methods, in this study, we translate the trained patch-based regression model to a fully convolutional network (FCN) manner³ for testing. As shown in Fig. 1(b), we first train a CNN regression model with image patches and their 3D displacements to landmarks as input, through which the network weights can be learned automatically. Then, we design another FCN architecture correlated with the trained patch-based model that only modifies the fully connected layers into the convolutional layers with the filter kernel of 1×1×1. Therefore, the entire image can be used as input for the FCN architecture. In the application stage (see Fig. 1(c)), given an entire testing image as the input, the displacements of large amounts of patches can be jointly estimated via the learned model. Finally, we can compute the locations of multiple landmarks jointly by adopting a majority voting strategy to assemble the displacements of all those image patches.

Results

In this study, we use 428 T1-weighted MR images from ADNI database. For each image, there are 1741 anatomical landmarks. Two-fold cross-validation is performed in our experiment. We compare our method with a group-wise two-layer regression forest based landmark detection method⁴. The results are shown in Fig.2. From the figure, we can observe that the proposed method achieves better detection accuracy, and also most of the landmarks can be detected within a relatively low error range.

Discussion

Besides achieving very stable landmark detection, our proposed method also has another major advantage, i.e., real-time landmark detection. The regression model can be well trained by using tremendous sampled image patches, since millions of image patches can be extracted from each MR image. Also, the network weights from the patch-based CNN architecture can be directly applied to the correlated FCN architecture. On the other hand, local patches are generally unreliable to estimate faraway landmarks, which is one of the challenges in landmark detection. Previous studies adopted coarse-to-fine multi-resolution strategy⁵ or group-wise prediction² to relieve this challenging problem. In our proposed method, we tackle this problem by simply using a weighted loss function.

Conclusion

The advantages of our proposed framework are three-fold：1) Patch-based training is adopted to avoid the problem of limited training samples. 2) FCN improves the testing efficiency by using the entire image as input for landmark detection. 3) Weighted loss function decreases the inaccuracy of predicting faraway landmarks. Our developed efficient and accurate landmark detector can be applied to various tasks, such as image registration, image segmentation, and neurodegenerative disease diagnosis.

Acknowledgements

No acknowledgement found.

References

1. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. InAdvances in neural information processing systems 2012. pp. 1097-1105.

2. Zhang J, Gao Y, Wang L, et al. Automatic craniomaxillofacial landmark digitization via segmentation-guided partially-joint regression forest model. International Conference on Medical Image Computing and Computer-Assisted Intervention. Oct 5 2015. pp. 661-668.

3. Sermanet P, Eigen D, Zhang X, et al. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. 2013 Dec 21.

4. Zhang J, Gao Y, Gao Y, et al. Detecting Anatomical Landmarks for Fast Alzheimer's Disease Diagnosis. IEEE Transactions on Medical Imaging. 2016. DOI: 10.1109/TMI.2016.2582386.

5. Gao Y, Shen D. Context-aware anatomical landmark detection: application to deformable model initialization in prostate CT images. International Workshop on Machine Learning in Medical Imaging. Sep 14 2014. pp. 165-173.

Figures

Fig. 1. Pipeline of the proposed method. (a) Definition of 3D displacements from an image patch to multiple landmarks, i.e., red arrows. (b) Patch-based training stage. The weights of patches learned from patch-based CNN regression model are applied to the correlated FCN architecture. (c) Entire image based prediction stage. The entire image is used for jointly predicting multiple landmarks.

Fig. 2. Cumulative distribution of landmark detection with different detection error intervals.

Proc. Intl. Soc. Mag. Reson. Med. 25 (2017)

1439