Analysis of geometrical and structural properties of the hip is of great importance to allow for meaningful comparison of significant findings. Especially with regard to large cohort studies manual processing of large 3D volumes becomes infeasible and thus automated processing is required. In this work, a Deep Learning driven algorithm is proposed which performs automated hip segmentation of 3D MRI datasets, requiring few training data and being able to perform accurate semantic bone segmentation in spite of complex anatomical structures sharing similar tissue characteristics.
Automated hip bone segmentation of MRI data is a mandatory prerequisite for subsequent analyses with respect to geometrical and structural properties. Degenerative processes, such as femoral acetabular impingement (FAI)1 and hip dysplasia can be examined by use of an accurate geometrical model. 3D volumes with high resolution are most suited for a meaningful investigation, but are cumbersome to process manually (Fig. 1). For analyzing large cohort data such as the German National Cohort (NAKO)2 an effective scheme allowing automated quantization has to be established. Due to challenging conditions, supervision of the employed algorithm is inevitable, but a low interaction to annotate the data is desirable.
Several Machine
Learning (ML) driven approaches have been established in the past, with Deep
Learning (DL)3 algorithms being the most successful to date. We propose a solution
specifically suited to medical datasets, named MedPatchNet which builds on recent
developments in the DL field. The proposed method intakes all necessary
information to perform an automated segmentation, even if tissues share common
characteristics.
Isotropic PD-weighted fast spin echo images in 200 subjects of the NAKO MR study have been analyzed. Data was acquired on a 3T MRI with imaging parameters of 1.0 mm isotropic resolution, matrix size 384x264x160, TE=33ms, TR 1200ms and bandwidth=500Hz/px. A subset of 11 subjects has been annotated with 3D pixel-wise annotations by experienced radiologists for a feasibility study.
The proposed patch based architecture suited for semantic segmentation of volumetric medical images (MedPatchNet, Fig. 2) combines recent architectural building blocks popularized within concepts like UNet4 and VNet5 using encoder-decoder structures, Fully Convolutional Network (FCN)6 for dense predictions, DeepLabv3+7 and Efficient Spatial Pyramid Network (ESPNet)8 for context aggregation, Dynamic Filter Networks (DFN)9 for the creation of filters adjusted dynamically to respective inputs as well as by concepts incorporating a-priori knowledge of the relative positional information of the underlying anatomy within the process, as done with respect to detection10 and segmentation11,12. Combining these ideas, reliable differentiation between annotations of similar tissues becomes possible. As such the center position of an input patch with respect to the volume of interest is passed to a small DFN consisting of three dense layers. Different from previous approaches concerned with hip bones13, less training data is required and a better generalization is achieved, despite operating on large volumes with high resolution.
Based on a FCN architecture with a UNet encoder-decoder structure 64x64x64 input patches are passed to the network and a prediction on the central 16x16x16 region is performed. By incorporating a large receptive field in combination with a small output region, accurate predictions without boundary effects at the patch edges can be achieved. The network is optimized by minimizing a soft Jaccard index loss over 2000 epochs based on a class-balanced patch sampling with a batch size of 18. Data augmentation, such as mirroring, rotating, scaling and the addition of noise to the voxel intensities and patch positions is applied.
To allow for meaningful analysis the experiment is performed with a leave-one-out cross-validation with a split in 10 training and 1 test subject(s). The Dice Similarity Coefficient (DSC) and the Average Symmetric Surface Distance (ASSD) are considered for quantitative evaluation of the overlap between ground truth and prediction as well as the distance in boundary differences between them.