Zhoubing Xu1, Guillaume Chabin2, Robert Grimm3, Stephan Kannengiesser3, Li Pan4, Vibhas Deshpande5, Gregor Thoermer3, Sasa Grbic1, and Cara Morin6
1Siemens Healthineers, Princeton, NJ, United States, 2Siemens Healthineers, Paris, France, 3Siemens Healthineers, Erlangen, Germany, 4Siemens Healthineers, Baltimore, MD, United States, 5Siemens Healthineers, Austin, TX, United States, 6St. Jude Children's Research Hospital, Memphis, TN, United States
Synopsis
Automated MRI liver segmentation enables the inline
evaluation of parametric maps for iron quantification with improved accuracy,
efficiency, and repeatability compared to manual efforts. Existing methods
optimized for adults and normal livers do not perform well on challenging cases
in children and patients with iron overload.
We developed a deep learning-based solution trained on 861 T1-weighted
MRI that provided significantly improved liver segmentation compared to a
commercially available solution and demonstrated its robustness on a
challenging cohort of pediatric patients including cases with high iron content.
Introduction
Iron overload occurs because of
genetic diseases such as hemochromatosis or secondarily, related to red blood
cell transfusions, chronic liver disease, or other causes. Excess iron
accumulates in the liver, which can cause inflammation and eventually hepatic
dysfunction, cirrhosis, and increased risk of hepatocellular carcinoma if not
treated. MRI is considered the gold standard for liver iron quantification,
providing accurate, non-invasive quantification for monitoring and treatment
planning1. Most current methods for MR liver iron
quantification require manual segmentation of the liver on one or multiple
slices. Typically, a region of interest (ROI) is drawn around the boundaries of
the liver in a single mid-hepatic slice. Liver iron can be heterogeneous in
distribution and thus assessment of the entire liver increases accuracy2. Automated,
inline evaluation of parametric maps for iron quantification and other diffuse
liver diseases characterization can decrease manual effort, improve accuracy,
and repeatability3.
However, livers with iron overload demonstrate varying
degrees of diffuse hypointensity on T1-weighted sequences compared to patients
with normal iron content. Such differences are challenging for automatic liver
segmentation algorithms4. The suboptimal liver segmentation hinders
subsequent in-line evaluation of estimating the whole-liver iron content or fat
fraction. Furthermore, models that are optimized for adult patients do not
perform well in pediatric patients. In this study, we pursue a competitive
liver segmentation approach with improved performance in children and patients
with iron overload by leveraging big data and challenging case augmentation. Methods
A 3D deep image-to-image network5 (DI2IN) was used as the
backbone as our deep learning algorithm. An adversarial network was used to regularize the training process of DI2IN by
discriminating the output from the ground truth. To accommodate various image
resolutions, all volumes were resampled to 2mm x 2mm x 2mm before processing through
DI2IN. During training, patches with size of 128 x 128 x 128 voxels were
randomly sampled around liver for data augmentation purpose. During testing, a
deep reinforcement learning based landmark detection6 was leveraged to
identify the liver center (shape constrained by the spleen and kidney centers
for robust detection), and thus to extract the liver ROI and ignore the
irrelevant background. The inferenced segmentation in the ROI was restored to
the original space and resolution.
The deep learning segmentation model was initially trained on
T1-weighted 3D gradient-echo imaging with fat suppression (spectral fat
suppression or Dixon water images) from 1037 patients (most with normal livers,
some with cirrhosis and/or hepatocellular carcinoma, few with high iron content).
195 patients were randomly selected as the validation set, and the remaining
842 were used for training. A second data cohort including 34 pediatric and
young adult patients clinically suspected to have iron overload (age 7-28;
median: 16 years) were acquired for refining the algorithm by enriching
challenging cases. 15 patients were randomly selected and reserved as the
testing set, and the remaining 19 were used to fine-tune the algorithm to be
more robust on pediatric cases and liver iron overload. During the fine-tuning
phase, 19 cases from the second cohort were sampled 4 times more frequently
compared to the other 842 training cases as a naïve rare case augmentation. Manual
liver annotations of the 1071 cases were performed by 7 experienced annotators
and validated by two radiologists.
For benchmarking, we considered a commercially available
solution (LiverLab; Siemens Healthcare, Erlangen, Germany) as the baseline, and
compared with the new deep learning approach with and without the challenging
case augmentation. The segmentation performance was evaluated and compared
against manual annotation on the 15 testing volumes based on Dice similarity
coefficient (DSC), average symmetric surface distance (ASSD), and 95th
percentile Hausdorff distance (95th HD). Single-tail Wilcoxon
signed-rank tests were used to evaluate the statistical significance.Results
While major improvements were achieved from the deep
learning approach compared to the baseline (p < 0.005 for all three metrics),
further significant improvements were observed by including a subset of pediatric
cases, some with high iron content, for algorithm augmentation (Figure 3),
i.e., DSC: 0.95 +/- 0.02 vs. 0.93 +/- 0.06 (p < 0.05), ASSD: 0.93 +/- 0.49
vs. 2.99 +/- 6.98 (p < 0.05), 95th HD: 3.79 +/- 2.22 vs. 20.29
+/- 57.28 (p < 0.01). The augmented
deep learning approach yielded no major failure across all 15 test cases, while
the other two methods suffered from large errors on patients with iron overload
(Figure 4). Discussion
Although much superior to traditional machine learning
methods, deep learning methods trained for regular purpose MRI liver
segmentation have their limitation to handle special cases such as pediatric patients
and livers with extreme iron overload. Inclusion of such rare cases (even less
than 20) with a naïve sampling augmentation for algorithm fine-tuning can be
effective to improve the segmentation robustness on completely unseen cases. Conclusion
We have developed a deep learning-based
solution that provides significantly improved liver segmentation on T1-weighted
MRI. Meanwhile our solution has also demonstrated its robustness on a
challenging cohort of pediatric patients including cases with high iron content
to enable more reliable and efficient liver-volume-based iron content estimation. Disclaimer
The concepts and
information presented in this abstract are based on research results that
are not commercially available. Future availability cannot be guaranteed.Acknowledgements
NoneReferences
- Labranche,
Roxanne, Guillaume Gilbert, Milena Cerny, Kim-Nhien Vu, Denis Soulières, Damien
Olivié, Jean-Sébastien Billiard, Takeshi Yokoo, and An Tang. "Liver iron quantification
with MR imaging: a primer for radiologists." Radiographics 38,
no. 2 (2018): 392-412.
- İdilman, İlkay
S., Deniz Akata, Mustafa Nasuh Özmen, and Muşturay Karçaaltıncaba.
"Different forms of iron accumulation in the liver on MRI." Diagnostic
and Interventional Radiology 22, no. 1 (2016): 22.
-
McCarville, M. Beth,
Claudia M. Hillenbrand, Ralf B. Loeffler, Matthew P. Smeltzer, Ruitan Song,
Chin-Shang Li, and Jane S. Hankins. "Comparison of whole liver and small
region-of-interest measurements of MRI liver R2* in children with iron
overload." Pediatric radiology 40, no. 8 (2010):
1360-1367.
- Tu, Zhuowen. "Probabilistic boosting-tree: Learning
discriminative models for classification, recognition, and clustering."
In Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume
1, vol. 2, pp. 1589-1596. IEEE, 2005.
- Yang, Dong, Daguang Xu, S. Kevin Zhou, Bogdan Georgescu,
Mingqing Chen, Sasa Grbic, Dimitris Metaxas, and Dorin Comaniciu.
"Automatic liver segmentation using an adversarial image-to-image
network." In International Conference on Medical Image Computing and
Computer-Assisted Intervention, pp. 507-515. Springer, Cham, 2017.
- Ghesu, Florin-Cristian, Bogdan Georgescu, Yefeng Zheng,
Sasa Grbic, Andreas Maier, Joachim Hornegger, and Dorin Comaniciu.
"Multi-scale deep reinforcement learning for real-time 3D-landmark
detection in CT scans." IEEE transactions on pattern analysis and machine
intelligence 41, no. 1 (2019): 176-189.