Landmark detection based on deep neural networks has achieved state-of-the-art performance in natural image analysis. However, it is challenging to detect anatomical landmarks from medical images, due to limited data. Here, we propose a real-time large-scale landmark detection method with limited training data. We train our model with image patches and test it with the entire image, inspired by fully convolutional networks. Also, we develop a weighted loss function in our model to increase the correlations between image patches and their nearby landmarks. The experimental results of detecting 1741 landmarks from brain MR images demonstrate the effectiveness of our method.
To address these problems, we propose a real-time landmark detector for large-scale landmark detection with limited 3D medical images. Specifically, in the convolutional regression based landmark detector, the nonlinear relationship between local patch appearance and its 3D displacements (see Fig. 1 (a)) to multiple target landmarks can be described by a regression model (e.g., random forest regression, SVM regression, or CNN regression)2. In the training stage, we follow the patch-based way to train a regression model based on deep convolutional neural networks (CNN), since there are tremendous patches available for training. In this way, each local patch is capable of estimating the positions of multiple landmarks jointly. Moreover, we also propose to use a weighted mean square error as the loss function, i.e., assigning lower weights for the displacements of faraway landmarks. Therefore, patches are expected to contribute more to their nearby landmarks, thus helping reduce the instability between patches and their faraway landmarks. Generally, we can obtain the optimized landmark positions by assembling the predictions of tremendous image patches via a weighted majority voting strategy2.
Different from the conventional methods, in this study, we translate the trained patch-based regression model to a fully convolutional network (FCN) manner3 for testing. As shown in Fig. 1(b), we first train a CNN regression model with image patches and their 3D displacements to landmarks as input, through which the network weights can be learned automatically. Then, we design another FCN architecture correlated with the trained patch-based model that only modifies the fully connected layers into the convolutional layers with the filter kernel of 1×1×1. Therefore, the entire image can be used as input for the FCN architecture. In the application stage (see Fig. 1(c)), given an entire testing image as the input, the displacements of large amounts of patches can be jointly estimated via the learned model. Finally, we can compute the locations of multiple landmarks jointly by adopting a majority voting strategy to assemble the displacements of all those image patches.
1. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. InAdvances in neural information processing systems 2012. pp. 1097-1105.
2. Zhang J, Gao Y, Wang L, et al. Automatic craniomaxillofacial landmark digitization via segmentation-guided partially-joint regression forest model. International Conference on Medical Image Computing and Computer-Assisted Intervention. Oct 5 2015. pp. 661-668.
3. Sermanet P, Eigen D, Zhang X, et al. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229. 2013 Dec 21.
4. Zhang J, Gao Y, Gao Y, et al. Detecting Anatomical Landmarks for Fast Alzheimer's Disease Diagnosis. IEEE Transactions on Medical Imaging. 2016. DOI: 10.1109/TMI.2016.2582386.
5. Gao Y, Shen D. Context-aware anatomical landmark detection: application to deformable model initialization in prostate CT images. International Workshop on Machine Learning in Medical Imaging. Sep 14 2014. pp. 165-173.