1871

A Deep Learning Approach for Robust Segmentation of Livers with High Iron Content from MR Images of Pediatric Patients

Zhoubing Xu¹, Guillaume Chabin², Robert Grimm³, Stephan Kannengiesser³, Li Pan⁴, Vibhas Deshpande⁵, Gregor Thoermer³, Sasa Grbic¹, and Cara Morin⁶
¹Siemens Healthineers, Princeton, NJ, United States, ²Siemens Healthineers, Paris, France, ³Siemens Healthineers, Erlangen, Germany, ⁴Siemens Healthineers, Baltimore, MD, United States, ⁵Siemens Healthineers, Austin, TX, United States, ⁶St. Jude Children's Research Hospital, Memphis, TN, United States

Synopsis

Automated MRI liver segmentation enables the inline evaluation of parametric maps for iron quantification with improved accuracy, efficiency, and repeatability compared to manual efforts. Existing methods optimized for adults and normal livers do not perform well on challenging cases in children and patients with iron overload. We developed a deep learning-based solution trained on 861 T1-weighted MRI that provided significantly improved liver segmentation compared to a commercially available solution and demonstrated its robustness on a challenging cohort of pediatric patients including cases with high iron content.

Introduction

Iron overload occurs because of genetic diseases such as hemochromatosis or secondarily, related to red blood cell transfusions, chronic liver disease, or other causes. Excess iron accumulates in the liver, which can cause inflammation and eventually hepatic dysfunction, cirrhosis, and increased risk of hepatocellular carcinoma if not treated. MRI is considered the gold standard for liver iron quantification, providing accurate, non-invasive quantification for monitoring and treatment planning¹. Most current methods for MR liver iron quantification require manual segmentation of the liver on one or multiple slices. Typically, a region of interest (ROI) is drawn around the boundaries of the liver in a single mid-hepatic slice. Liver iron can be heterogeneous in distribution and thus assessment of the entire liver increases accuracy². Automated, inline evaluation of parametric maps for iron quantification and other diffuse liver diseases characterization can decrease manual effort, improve accuracy, and repeatability³.
However, livers with iron overload demonstrate varying degrees of diffuse hypointensity on T1-weighted sequences compared to patients with normal iron content. Such differences are challenging for automatic liver segmentation algorithms⁴. The suboptimal liver segmentation hinders subsequent in-line evaluation of estimating the whole-liver iron content or fat fraction. Furthermore, models that are optimized for adult patients do not perform well in pediatric patients. In this study, we pursue a competitive liver segmentation approach with improved performance in children and patients with iron overload by leveraging big data and challenging case augmentation.

Methods

A 3D deep image-to-image network⁵ (DI2IN) was used as the backbone as our deep learning algorithm. An adversarial network was used to regularize the training process of DI2IN by discriminating the output from the ground truth. To accommodate various image resolutions, all volumes were resampled to 2mm x 2mm x 2mm before processing through DI2IN. During training, patches with size of 128 x 128 x 128 voxels were randomly sampled around liver for data augmentation purpose. During testing, a deep reinforcement learning based landmark detection⁶ was leveraged to identify the liver center (shape constrained by the spleen and kidney centers for robust detection), and thus to extract the liver ROI and ignore the irrelevant background. The inferenced segmentation in the ROI was restored to the original space and resolution.
The deep learning segmentation model was initially trained on T1-weighted 3D gradient-echo imaging with fat suppression (spectral fat suppression or Dixon water images) from 1037 patients (most with normal livers, some with cirrhosis and/or hepatocellular carcinoma, few with high iron content). 195 patients were randomly selected as the validation set, and the remaining 842 were used for training. A second data cohort including 34 pediatric and young adult patients clinically suspected to have iron overload (age 7-28; median: 16 years) were acquired for refining the algorithm by enriching challenging cases. 15 patients were randomly selected and reserved as the testing set, and the remaining 19 were used to fine-tune the algorithm to be more robust on pediatric cases and liver iron overload. During the fine-tuning phase, 19 cases from the second cohort were sampled 4 times more frequently compared to the other 842 training cases as a naïve rare case augmentation. Manual liver annotations of the 1071 cases were performed by 7 experienced annotators and validated by two radiologists.
For benchmarking, we considered a commercially available solution (LiverLab; Siemens Healthcare, Erlangen, Germany) as the baseline, and compared with the new deep learning approach with and without the challenging case augmentation. The segmentation performance was evaluated and compared against manual annotation on the 15 testing volumes based on Dice similarity coefficient (DSC), average symmetric surface distance (ASSD), and 95^th percentile Hausdorff distance (95^th HD). Single-tail Wilcoxon signed-rank tests were used to evaluate the statistical significance.

Results

While major improvements were achieved from the deep learning approach compared to the baseline (p < 0.005 for all three metrics), further significant improvements were observed by including a subset of pediatric cases, some with high iron content, for algorithm augmentation (Figure 3), i.e., DSC: 0.95 +/- 0.02 vs. 0.93 +/- 0.06 (p < 0.05), ASSD: 0.93 +/- 0.49 vs. 2.99 +/- 6.98 (p < 0.05), 95^th HD: 3.79 +/- 2.22 vs. 20.29 +/- 57.28 (p < 0.01). The augmented deep learning approach yielded no major failure across all 15 test cases, while the other two methods suffered from large errors on patients with iron overload (Figure 4).

Discussion

Although much superior to traditional machine learning methods, deep learning methods trained for regular purpose MRI liver segmentation have their limitation to handle special cases such as pediatric patients and livers with extreme iron overload. Inclusion of such rare cases (even less than 20) with a naïve sampling augmentation for algorithm fine-tuning can be effective to improve the segmentation robustness on completely unseen cases.

Conclusion

We have developed a deep learning-based solution that provides significantly improved liver segmentation on T1-weighted MRI. Meanwhile our solution has also demonstrated its robustness on a challenging cohort of pediatric patients including cases with high iron content to enable more reliable and efficient liver-volume-based iron content estimation.

Disclaimer

The concepts and information presented in this abstract are based on research results that are not commercially available. Future availability cannot be guaranteed.

Acknowledgements

None

References

Labranche, Roxanne, Guillaume Gilbert, Milena Cerny, Kim-Nhien Vu, Denis Soulières, Damien Olivié, Jean-Sébastien Billiard, Takeshi Yokoo, and An Tang. "Liver iron quantification with MR imaging: a primer for radiologists." Radiographics 38, no. 2 (2018): 392-412.
İdilman, İlkay S., Deniz Akata, Mustafa Nasuh Özmen, and Muşturay Karçaaltıncaba. "Different forms of iron accumulation in the liver on MRI." Diagnostic and Interventional Radiology 22, no. 1 (2016): 22.
McCarville, M. Beth, Claudia M. Hillenbrand, Ralf B. Loeffler, Matthew P. Smeltzer, Ruitan Song, Chin-Shang Li, and Jane S. Hankins. "Comparison of whole liver and small region-of-interest measurements of MRI liver R2* in children with iron overload." Pediatric radiology 40, no. 8 (2010): 1360-1367.
Tu, Zhuowen. "Probabilistic boosting-tree: Learning discriminative models for classification, recognition, and clustering." In Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, vol. 2, pp. 1589-1596. IEEE, 2005.
Yang, Dong, Daguang Xu, S. Kevin Zhou, Bogdan Georgescu, Mingqing Chen, Sasa Grbic, Dimitris Metaxas, and Dorin Comaniciu. "Automatic liver segmentation using an adversarial image-to-image network." In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 507-515. Springer, Cham, 2017.
Ghesu, Florin-Cristian, Bogdan Georgescu, Yefeng Zheng, Sasa Grbic, Andreas Maier, Joachim Hornegger, and Dorin Comaniciu. "Multi-scale deep reinforcement learning for real-time 3D-landmark detection in CT scans." IEEE transactions on pattern analysis and machine intelligence 41, no. 1 (2019): 176-189.

Figures

Figure 1. Examples of T1-w Dixon water volumes of four patients with normal iron content (top row, two patients) and high iron content (bottom row, two patients).

Figure 2. Algorithm workflow at training and testing phase.

Figure 3. Quantitative comparison across tested approaches on DSC, ASSD, and 95^th HD.

Figure 4. Qualitative comparison across tested approaches on three representative cases. (a) T1-weighted water image, (b) baseline approach, (c) deep learning approach, (d) deep learning with pediatric and high iron case augmentation, (e) manual annotation. DSC against manual annotation is provided at the bottom left corner for each case of (b, c, d).

Proc. Intl. Soc. Mag. Reson. Med. 29 (2021)

1871