Brett Marinelli1, Trevor Ellison1, Kaustubh Kulkarni1, Dudley Charles1, Bachir Taouli1, Anthony Costa1, and Edward Kim1
1Radiology, Mount Sinai, New York, NY, United States
Synopsis
Deep learning is an important tool that can help drive important new innovations in medicine, including in MRI tumor segmentation for HCC. Large annotated data sets will be needed for effective deep learning, however, current techniques are tedious and inefficient for annotating images on a large scale. We propose a streamlined infrastructure to optimize and standardize the process of anonymizing patient information, structuring the data, and annotating images efficiently. We show that our streamlined infrastructure increases the speed at which ground truth annotations can be generated.
Introduction
Deep learning that facilitate and utilize image segmentations holds promise for vast applications in radiology and medicine as a whole. Semantic segmentation using 3DUnets, as well as emerging sparsely supervised convolutional neural networks (CNNs), while promising face a bottleneck as large quantities of high quality training data are essential for model convergence and generalizability. Most current annotation software are tedious and lack efficiency on the scale needed for deep learning. Anonymization and structuring of data to ensure privacy while allowing workflow efficiency presents an additional challenge requiring significant human attention.We describe automated programmatic approaches to large-scale MRI data annotation including study conversion and de-identification with a uniform file hierarchy, integration with customized, open-source annotation software, and fast 3D tumor labeling in patients undergoing radiation segmentectomy (RS) for hepatocellular carcinoma (HCC).Methods
Our proposed annotation infrastructure consists of three steps: 1) batch conversion, de-identification and structuring of imaging studies, 2) automated data loading with preset hanging protocol into annotation viewer Slicer (www.slicer.org), and 3) 3D tumor segmentation with Slicer SegmentEditor GrowCut tool. These steps leverage a compilation of open-source tools wrapped in Python including Nipype (nipype.readthedocs.io), SimpleITK (www.simpleitk.org), and Slicer APIs. As a comparator for step 1, de-identification and conversion to Nifti was done by batch exporting from Osirix (Pixmeo, Bernex, Switzerland) and import/exporting in Slicer. Comparator for step 2 was manually loading data and adjusting viewer panes in Slicer. The step 3 annotation task entailed two radiologists independently labeling the same HCC lesions before and after RS in 23 patients. Each step was timed when ran on a MacBookPro 2.5 GHz Intel i7 processor compared to conventional approach using Student’s t-test. Inter-segmenter variance was assessed using intraclass correlation (ICC) statistics.Results
Using the conventional method, anonymizing, file conversion and data structuring of 23 patients took 51.3 minutes vs. 11.3 minutes with our automated technique. Loading of data into viewer hanging protocol for annotation per study with manual technique took on average 71.5 +/- 6.2 seconds vs. 5.2 +/- 2.2 seconds (p<0.0001) with our automated approach. 3D tumor segmentation using Slicer’s GrowCut algorithm took on average 239 +/- 80 seconds per tumor and demonstrated an excellent ICC of 0.953 (p<0.001).Discussion
We describe a streamlined infrastructure capable of batch annotation using free, open source software. The conventional de-identification, conversion and structuring step was not only 5 times longer than our approach, but was also fully automated ensuring fewer sources of human error. It also entailed running a single line script as oppose to requiring familiarity with two different software program user interfaces. Similarly, for the data loading step over a minute of time was saved per study translating in more time available for annotation. As deep learning typically requires hundreds of training sets, this time saved may translate into hours to days of time saved for professional radiologists. While the GrowCut approach to segmentation is well-described we show here in a body MRI use case a speed comparable to alternate techniques with low inter-segmenter variance. Limitations of this study include a relatively small number of cases and a single application--liver tumor labeling. Future studies should focus on validating this infrastructure using larger datasets in various deep learning applications.Conclusion
In conclusion, we propose a novel modular and generalizable open source software designed to streamline the process of data anonymization, data structuring, and annotation with increases in efficiency while maintaining high consistency. When used on larger sets of data where small increases in efficiency lead to larger increases in overall production, implementation of our streamlined infrastructure can lead creation of larger libraries of annotated data that are necessary to drive deep learning.Acknowledgements
No acknowledgement found.References
Young, S, et al. “Software to Facilitate and Streamline Camera Trap Data Management: A Review.” Ecology and Evolution., U.S. National Library of Medicine, 6 Sept. 2018, www.ncbi.nlm.nih.gov/pubmed/30386588.3.
Gorgolewski K, Burns CD, Madison C, Clark D, Halchenko YO, Waskom ML, Ghosh SS. (2011). Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in Python. Front. Neuroimform. 5:13
Fedorov, Andriy, et al. “3D Slicer as an Image Computing Platform for the Quantitative Imaging Network.” Magnetic Resonance Imaging, vol. 30, no. 9, 2012, pp. 1323–1341., doi:10.1016/j.mri.2012.05.001.6.
Vezhnevets, V. & Konouchine, V. GrowCut: Interactive multi-label ND image segmentation by cellular automata. in proc. of Graphicon 1, 150–156 (Citeseer, 2005).