2593

Deep Learning for Liver Segmentation and Quantification of Obese Patients

Philipp Madörin¹, Xeni Deligianni^1,2, Francesco Santini^1,2, Simon Andermatt², Philippe Claude Cattin², Anne Christin Meyer-Gerspach^3,4, Bettina Karin Wölnerhanssen^3,4, Oliver Bieri^1,2, and Orso Pusterla^1,2,5
¹Department of Radiology, Division of Radiological Physics, University Hospital Basel, Basel, Switzerland, ²Department of Biomedical Engineering, University of Basel, Basel, Switzerland, ³St. Clara Research Ltd, St. Clara Hospital, Basel, Switzerland, ⁴University of Basel, Basel, Switzerland, ⁵Institute for Biomedical Engineering, University and ETH Zurich, Zurich, Switzerland

Synopsis

Obesity is one of the greatest health risks and strongly related to fatty liver disease. Magnetic resonance imaging enables non-invasive measurement of fat-water distribution in tissue. To provide an automated evaluation of the liver volume and fat percentage, we trained a Multi-Dimensional Gated Recurrent Units network to segment multi-contrast data. The neural network was trained with a limited number of data comprising 52, 20, 10 datasets and was evaluated for liver volume and fat percentage quantification.

Introduction

Obesity has become one of the greatest public health challenges and has a high epidemic prospect. It is estimated that half the world’s adult population will be overweight and obese by 2030.¹
Generally, obesity is strongly related to fatty liver disease which can be non-invasively assessed by magnetic resonance imaging (MRI) by measuring fat-water distribution in tissues and other biomarkers.² To quantitatively evaluate the data, e.g. for liver volume and fat percentage, the liver must be segmented. Currently, this time-consuming task is usually performed manually, hampering the translation to clinical routine and big data analysis for biomarker characterization. Hence, a reliable automatic method for liver segmentation and evaluation as provided by current deep learning algorithms is required.
The purpose of this study is to investigate the prospect of deep learning with Multi-Dimensional Gated Recurrent Units (MD-GRU)^3,4 for liver segmentation and to determine the reliability of the automated method on liver volume and fat percentage. The training of the MD-GRU network, which strongly differs from classical Convolutional Neural Networks (CNNs) approaches⁵, was performed on small cohorts of datasets (52, 20, 10) to evaluate the network’s minimal amount of data for a reliable segmentation and outcome.

Methods

MR Data
A cohort of 36 obese patients (body mass index > 25) were included in this study approved by our ethics committee. Subjects underwent MRI at two different timepoints (within a time span of 1-4 weeks), resulting in a total of 72 datasets. The MRI protocol for the quantitative (fat fraction and T2*) liver measurement included a transversal T2*-IDEAL 3D multi-echo Flash sequence⁶ acquired in a single inspiratory breath-hold (TA=17 s, TR=9.05 ms, 6 equally spaced echoes with TE=0.74-7.74 ms and echo spacing 1.4 ms, FA=6°, FoV=450×337.5 mm, slice thickness 4 mm, 224×179x56 matrix [voxel size 2.0×2.0×4.0 mm]).
The liver was manually segmented with ITK-SNAP⁷ for quantifying liver fat percentage and total liver volume.

Neural Network
A neural network consisting of MD-GRU layers (freely available online: https://github.com/zubata88/mdgru) and with an on-the-fly data augmentation technique was evaluated for liver segmentation.^3,4 The training was performed on 52 (DL₅₂), 20 (DL₂₀), and 10 (DL₁₀) datasets (i.e., 26, 10, and 5 subjects examined twice), and testing on 20 datasets (i.e. 10 subjects examined twice, not included in the training and representing a new cohort). Input to the MD-GRU were first-in-phase echo, fat fraction, T2* (3 contrasts) and liver segmentation (shown in Figure 1), which were downsampled by a factor of 2 (in-plane dimension) to reduce computational complexity.

Data Analysis
An observer visually scored the number of automatically segmented datasets that were not consistent and would require further refinement for a quantitative analysis of the liver. We calculated dice coefficient, true-positive volume fraction (TPVF), false-negative volume fraction (FNVF), and false-positive volume fraction (FPVF) between the manual segmentation and the automated ones. As main study outcome, we quantified the total liver volume directly from the output segmentation masks and the average liver fat percentage by applying the produced masks to the IDEAL fat fraction maps.
As every patient was scanned at two subsequent timepoints, between which no major change is expected, the mean absolute deviation between these two timepoints was calculated as a reliability test.

Results

Representative manual and automated segmentations are shown in Figure 2. Qualitatively, for DL₅₂ segmentations were good in 70% of cases (no refinement is needed) and 30% would need very minor refinements (not performed for this study); for DL₂₀, good in 50% of cases, 40% would need minor refinements, and 10% would need large corrections; and for DL₁₀ good in 30% of cases, 60% would need minor refinements and 10% would need large corrections
Dice coefficient, TPVF, FNVF, FPVF for the three trained models are presented in Table 1. The dice coefficient for DL₅₂ was 0.95±0.02 (mean±SD), for DL₂₀ 0.92±0.06 and for DL₁₀ 0.90±0.10. DL₅₂ and DL₂₀ show promising results, while DL₁₀ is not enough accurate.
As compared to human segmentation, the average difference in liver volume was 2%±7% (mean absolute error (MAE) 5%) when evaluated with DL₅₂, and -4%±13% (MAE 8%) with DL₂₀. Fat percentage deviated 3%±5% (MAE 4%) with DL₅₁, and 2%±5% (MAE 4%) for DL₂₀ (analysis for DL₁₀ is not presented). The agreement between human and DL on these quantities is visually shown in Figure 3.
The repeatability calculated across the timepoints was good (<2%, relative deviation) and statistically similar for human segmentation, DL₅₂, and DL₂₀.

Discussion and Conclusion

In this work, MD-GRU shows competitive accuracy to other liver segmentations approaches, even using a reduced number of datasets. Interestingly, already with 20 datasets the network can perform a preliminary liver segmentation and might be used for novel studies, e.g. planned with a different imaging setup. Further application of the MD-GRU in the presented cohort of patients will include segmentation of visceral and subcutaneous fat.
In conclusion, main outcomes such as liver fat percentage and volume can be evaluated with MD-GRU, reducing the required manual work and increasing the feasibility of routine liver quantitative assessment with MRI.

Acknowledgements

No acknowledgement found.

References

1. Kelly T, YangW, Chen CS, et al. Global burden of obesity in 2005 and projections to 2030. Int J Obes. 2008; 32(9):1431–7.

2. Meyer-Gerspach AC, Peterli R, Moor M et al. OBES SURG (2019) 29: 2795. https://doi.org/10.1007/s11695-019-03897-2.

3. Andermatt S, Pezold S, Cattin PC (2016) Multi-dimensional Gated Recurrent Units for the Segmentation of Biomedical 3D-Data. In: Carneiro G. et al. (eds) Deep Learning and Data Labeling for Medical Applications. DLMIA 2016, LABELS 2016. Lecture Notes in Computer Science, vol 10008. Springer, Cham.

4. Andermatt S, Pezold S, Cattin PC (2018) Automated Segmentation of Multiple Sclerosis Lesions Using Multi-dimensional Gated Recurrent Units. In: Crimi A., Bakas S., Kuijf H., Menze B., Reyes M. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2017. Lecture Notes in Computer Science, vol 10670. Springer, Cham.

5. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436–444.

6. Bydder M, Yokoo T, Hamilton G, et al. Relaxation ef-fects in the quantification of fat using gradient echoimaging. Magn Reson Imaging 2008; 26:347–359.

7. Paul A. Yushkevich, Joseph Piven, Heather Cody Hazlett, Rachel Gimpel Smith, Sean Ho, James C. Gee, and Guido Gerig. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage. 2006 Jul 1; 31(3):1116-28.

Figures

Figure 1. Example of 3 input contrasts (downsampled) used for MD-GRU training: first-in-phase echo (a), fat fraction (b), T2* (c).

Figure 2. T2*-IDEAL 3D multi-echo Flash coronal images (downsampled), with manual (a, d, g), DL₅₂ (b, e, h) and DL₂₀ (c, f, i) driven segmentation. The first row shows good (b,c), the second row weak (e,f - flaws of boundaries) and the last row poor (h, i - missing liver parts) results.

Figure 3. Bland-Altman plots for relative difference between human and MD-GRU. The mean difference of fat fraction was 0.99 for DL₅₂ with limits of agreement (-3.5, 5.5) and for DL₂₀ 0.77 (-4.9, 6.9). For the liver volume the variability is larger for smaller volumes than for bigger.

Table 1. Dice coefficient, true positive volume fraction (TPVF), false negative volume fraction (FNVF), and false positive volume fraction (FPVF) for the three trained models (std: standard deviation, DL₅₂: network trained with 52 datasets, DL₂₀: network trained with 20 datasets, DL_10: trained with 10 datasets).

Proc. Intl. Soc. Mag. Reson. Med. 28 (2020)

2593