0840

Automated Deep Learning based 3D Hip Segmentation in PD-weighted MR images of a large-scale cohort study

Marc Fischer^1,2, Sven Walter¹, Christian Klinger¹, Thomas Küstner^1,2,3, Bin Yang², Mike Notohamiprodjo¹, and Fritz Schick¹

¹Department of Radiology, University Hospital Tübingen, Tübingen, Germany, ²Institute of Signal Processing and System Theory, University of Stuttgart, Stuttgart, Germany, ³School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom

Synopsis

Analysis of geometrical and structural properties of the hip is of great importance to allow for meaningful comparison of significant findings. Especially with regard to large cohort studies manual processing of large 3D volumes becomes infeasible and thus automated processing is required. In this work, a Deep Learning driven algorithm is proposed which performs automated hip segmentation of 3D MRI datasets, requiring few training data and being able to perform accurate semantic bone segmentation in spite of complex anatomical structures sharing similar tissue characteristics.

Introduction

Automated hip bone segmentation of MRI data is a mandatory prerequisite for subsequent analyses with respect to geometrical and structural properties. Degenerative processes, such as femoral acetabular impingement (FAI)¹ and hip dysplasia can be examined by use of an accurate geometrical model. 3D volumes with high resolution are most suited for a meaningful investigation, but are cumbersome to process manually (Fig. 1). For analyzing large cohort data such as the German National Cohort (NAKO)² an effective scheme allowing automated quantization has to be established. Due to challenging conditions, supervision of the employed algorithm is inevitable, but a low interaction to annotate the data is desirable.

Several Machine Learning (ML) driven approaches have been established in the past, with Deep Learning (DL)³ algorithms being the most successful to date. We propose a solution specifically suited to medical datasets, named MedPatchNet which builds on recent developments in the DL field. The proposed method intakes all necessary information to perform an automated segmentation, even if tissues share common characteristics.

Methods

Isotropic PD-weighted fast spin echo images in 200 subjects of the NAKO MR study have been analyzed. Data was acquired on a 3T MRI with imaging parameters of 1.0 mm isotropic resolution, matrix size 384x264x160, TE=33ms, TR 1200ms and bandwidth=500Hz/px. A subset of 11 subjects has been annotated with 3D pixel-wise annotations by experienced radiologists for a feasibility study.

The proposed patch based architecture suited for semantic segmentation of volumetric medical images (MedPatchNet, Fig. 2) combines recent architectural building blocks popularized within concepts like UNet⁴ and VNet⁵ using encoder-decoder structures, Fully Convolutional Network (FCN)⁶ for dense predictions, DeepLabv3+⁷ and Efficient Spatial Pyramid Network (ESPNet)⁸ for context aggregation, Dynamic Filter Networks (DFN)⁹ for the creation of filters adjusted dynamically to respective inputs as well as by concepts incorporating a-priori knowledge of the relative positional information of the underlying anatomy within the process, as done with respect to detection¹⁰ and segmentation^11,12. Combining these ideas, reliable differentiation between annotations of similar tissues becomes possible. As such the center position of an input patch with respect to the volume of interest is passed to a small DFN consisting of three dense layers. Different from previous approaches concerned with hip bones¹³, less training data is required and a better generalization is achieved, despite operating on large volumes with high resolution.

Based on a FCN architecture with a UNet encoder-decoder structure 64x64x64 input patches are passed to the network and a prediction on the central 16x16x16 region is performed. By incorporating a large receptive field in combination with a small output region, accurate predictions without boundary effects at the patch edges can be achieved. The network is optimized by minimizing a soft Jaccard index loss over 2000 epochs based on a class-balanced patch sampling with a batch size of 18. Data augmentation, such as mirroring, rotating, scaling and the addition of noise to the voxel intensities and patch positions is applied.

To allow for meaningful analysis the experiment is performed with a leave-one-out cross-validation with a split in 10 training and 1 test subject(s). The Dice Similarity Coefficient (DSC) and the Average Symmetric Surface Distance (ASSD) are considered for quantitative evaluation of the overlap between ground truth and prediction as well as the distance in boundary differences between them.

Results

DSC and ASSD values are reported in Fig. 3. All four classes achieve high DSC values with the lowest having a mean DSC of 0.895. Median ASSD values below 2.86 mm indicate that the prediction deviates only with respect to a few voxels. Exemplary segmentation results are shown in Fig. 4. Femora and Pelvis are correctly delineated and even the intricate acetabulum region is segmented reliably.

Discussion

The study has only been performed with respect to a limited amount of data. However, despite the intricate imaging modality and complex anatomy, promising agreements of the automated segmentation could be achieved when compared to the manual segmentation. The algorithm generalizes well by the few seen training examples and is able to provide a high resolution differentiation of bone tissue even in the presence of severe inhomogeneities. Even for moderate predictions a promising conformity is achieved. Present misclassifications are mostly limited to isolated clusters that could be erased in a post-processing step.

Conclusion

Accurate segmentation of hip bones in full resolution is achievable with the proposed DL architecture despite the scarce amount of data by relying on an efficient architecture focusing on small 3D output patches and dynamic filters. Further automated analysis becomes thus feasible. Future studies will focus on generalizability in more annotated data sets.

Acknowledgements

No acknowledgement found.

References

Parvizi J, Leunig M, Ganz R. Femoroacetabular impingement. JAAOS-Journal of the American Academy of Orthopaedic Surgeons, 2007;15(9):561-570.
Bamberg F, Kauczor HU, Weckbach S, et al. Whole-Body MR Imaging in the German National Cohort: Rationale, Design, and Technical Background. Radiology 2015;277(1):206-220.
Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Medical image analysis 2017;42:60-88.
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015:234-241.
Milletari F, Nassir N, Ahmadi SA. V-net: Fully convolutional neural networks for volumetric medical image segmentation. 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, 2016:565-571.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015:3431-3440.
Chen LC, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611 2018.
Mehta S, Rastegari M, Caspi A, et al. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation. arXiv preprint arXiv:1803.06815 2018.
Jia X, De Brabandere B, Tuytelaars T, et al. Dynamic filter networks. Advances in Neural Information Processing Systems. 2016:667-675.
Ghafoorian M, Karssemeijer N, Heskes T, et al. Deep multi-scale location-aware 3D convolutional neural networks for automated detection of lacunes of presumed vascular origin. NeuroImage: Clinical 2017;14:391-399.
Küstner T, Müller S, Fischer M, et al. Semantic Organ Segmentation in 3D Whole-Body MR Images. 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018:3498-3502.
Küstner T, Fischer M, Müller S, et al. Automated segmentation of abdominal organs in T1-weighted MR images using a deep learning approach: application on a large epidemiological MR study. Proceedings of the International Society for Magnetic Resonance in Medicine (ISMRM) 2018.
Deniz CM, Xiang S, Hallyburton S, et al. Segmentation of the Proximal Femur from MR Images using Deep Convolutional Neural Networks. arXiv preprint arXiv:1704.06176 2017.

Figures

Figure 1: Overview of the PD-weighted scans. The dataset comprises 11 subjects. Exemplary coronal and axial slices through the femur heads are illustrated. The corresponding 3D ground truth, incorporating the right and left femur as well as the pelvis regions, is shown on the right. The sequence is well suited for further textural and structural analyses, but incorporates various inhomogeneities, impeding an automated segmentation.

Figure 2: Proposed MedPatchNet encoder-decoder structure with shortcuts, containing convolutions with different kernel sizes and strides (down-c, up-c), ESP blocks as well as dynamic convolutions (dyn-c) with kernels created by a DFN based on the patch position within the image. An ESP block employs a split-transform-merge strategy incorporating parallel dilated convolutions with dilation factor 1, 2 (and 4) followed by hierarchical feature fusion (HFF) to avoid gridding artifacts. From highest to lowest resolution 32, 64, 128 and 256 channels are used. In-between convolutions Batch Normalization (BN) and Leaky ReLU are employed.

Figure 3: Quantitative analysis of the approach on the given dataset. Leave-one-out cross-validation has been performed, to establish meaningful analysis of the 11 subjects, resulting in a split containing 10 training and one test subject. DSC, for segmentation performance, and ASSD, indicating the distance between boundaries of the ground truth and prediction label, are reported. An overall mean DSC value of 0.909 has been achieved. The ASSD has a mean value of 2.87 mm with the femora being slightly worse compared to the pelvis, mostly due to misclassifications as indicated in Fig. 4.

Figure 4: Illustration of the segmentation quality of the MedPatchNet prediction compared to the manually generated ground truth for coronal and axial slices of two test subjects. The employed architecture is able to reliably differentiate between bone tissues, despite variations present in the dataset. In a) and b) two processed subjects are illustrated with a mean DSC of 0.943 and 0.903 respectively. Both femora and the pelvis structure are segmented accurately with only small misclassifications. In b) larger isolated misclassifications are present, which results in a high mean ASSD value of 5.05 mm for this subject.

Proc. Intl. Soc. Mag. Reson. Med. 27 (2019)

0840