Automated segmentation of organs and anatomical structures is a prerequisite for efficient analysis of MR data in large cohort studies with thousands of participants. The feasibility of deep learning approaches has been shown to provide good solutions. Since all these methods are based on supervised learning, labeled ground truth data is required which can be time- and cost-intensive to generate. This work examines the feasibility of transfer learning between similar epidemiological cohort studies to derive possibilities in reuse of labeled training data.
Automated analysis of imaging data plays an important role for rapidly increasing numbers of large cohort studies and information content per study. Especially in large epidemiological imaging studies with thousands of scanned individuals, a visual and manual image processing is no longer feasible. In order to target any automated analysis, one crucial step in the evaluation of medical image content is the recognition and segmentation of organs. Several methods applying automated organ segmentation to MR images have been proposed so far, including methods relying on explicit models1, general correspondence2, machine learning such as Random Forests3 as well as more recently deep learning4 (DL) approaches including convolutional neural networks (CNNs)5. Amongst these methods, DL6,7 demonstrates excellent performance. All schemes share the goal to infer class labels by generalizing well from labeled training data. However, the creation of labeled training data can be time-consuming and cost-intensive, because it needs to be performed by trained experts. Reuse of already labeled training data is wherever suitable therefore of high interest.
In this study, we examine the feasibility of a transfer learning between two similar epidemiological MR imaging databases: Cooperative Health Research in the Region of Augsburg (KORA)8 and German National Cohort (NAKO)9. We utilize our previously proposed DCNet10,11 to automatically segment liver and spleen in both databases and investigate the capabilities of a transfer learning.
In the context of the KORA MR study, coronal whole-body T1-weighted dual-echo gradient echo images were acquired for 397 subjects on a 3T MRI. The imaging parameters are: 1.7mm isotropic resolution with matrix size=288x288x160, TEs=1.26/2.52ms, TR=4.06ms, flip angle=9°, bandwidth=755Hz/px. For the NAKO MR study, transversal whole-body T1-weighted dual-echo gradient echo images were acquired for 200 subjects on a 3T MRI with imaging parameters: 1.4x1.4x3 mm, TEs=1.23/2.46ms, TR=4.36ms, flip angle=9°, bandwidth=680Hz/px. Subsets of 173/97 for KORA/NAKO were randomly selected and liver and spleen were manually labeled by experienced radiologists.
The proposed DCNet10,11 (Fig.1) combines the concepts of UNet12 for pixel-wise localization, VNet7 for volumetric medical image segmentation, ResNet13 to deal with vanishing gradients and degradation problems and DenseNet14 to enable a deep supervision. For the task of semantic segmentation, additional positional information is fed to the network serving as a-priori knowledge of relative organ positioning. This is achieved by concatenating global scanner coordinates of the respective VOI to the feature maps after the encoding branch.
Input images are cropped to 3D patches of size 32x32x32 with 2-channel input (fat and water image) enabling better delineation of organs. An RMSprop trains the network with Jaccard distance loss, batch size=48 and over 50 epochs over 4 runs to investigate stability. Different training cases were conducted to evaluate the influence of transfer learning on the classification result (Tab.1).
Overall, a very good agreement of automated organ segmentation of liver and spleen was observed compared to manual segmentation as shown exemplary in Fig.2. The smaller sized spleen can be well delineated simultaneously to the larger sized liver, although their similar head-feet position. Best results were obtained if the networks were trained and evaluated on the same database (case 1 and 4). Despite the anisotropic resolution in axial direction and the stitched multi-bed FOVs of the NAKO dataset, the network’s prediction (KORA->NAKO) is still of high accuracy indicating generalizability of the proposed method. However, backward prediction (NAKO->KORA) is losing specificity for anisotropic NAKO to isotropic KORA prediction. The positional input encoding helps to reduce false outliers especially in the transfer learning context (Fig.3). High quantitative metrics (Fig.4) are obtained for all classes in the cases 1 and 4, whilst these values are slightly reduced in the transfer learning cases 2 and 3. Background, respectively outliers, is increased in the transfer learning cases indicated by reduced precision and Jaccard.
Our study has limitations. The obtained results are specific for the imaging setup and MR sequence design of the study at hand. The segmented organs in our study (liver and spleen) are rather large structures in the upper abdomen and may thus be easier to segment compared to more complicated anatomic structures. Thus, generalizability to other imaging sequences and organs will be investigated in future studies.