Annemarie Knill1,2, Antonio Candito1, Jessica Winfield1,2, James Larkin1,2, Samra Turajlic2,3, Dow Mu Koh1,2, Christina Messiou1,2, and Matthew Blackledge1
1The Institute of Cancer Research, London, United Kingdom, 2The Royal Marsden NHS Foundation Trust, London, United Kingdom, 3The Francis Crick Institute, London, United Kingdom
Synopsis
We present a proof-of-concept study to assess whether deformable registration followed by tissue classification using machine learning (ML) is an effective method for the delineation of liver metastases in whole-body diffusion-weighted imaging (WB-DWI). Deformable atlas-based registration achieves good quality delineation of the liver (Dice coefficient > 70%) and out of three ML models random forest achieved the best F-1 measure for segmenting disease within the liver.
Introduction
The liver is a common site of metastasis from many cancers1, and the use of whole-body diffusion-weighted MRI (WB-DWI) in oncologic imaging has been shown to be beneficial in multiple cancer types including metastatic melanoma, prostate and breast cancer, and myeloma2–4. WB-DWI can provide measurements of the apparent diffusion coefficient (ADC), for which a measured increase following treatment can indicate a positive response. However, manual delineation of liver metastases is too time consuming for clinical practice, hence the development of automated segmentation with minimal user interaction is important to enable routine use of ADC measurement. The aim of this study was to develop an effective method for the delineation of liver metastases in WB-DWI using deformable registration followed by tissue classification using machine learning (ML).Methods
Our liver lesion segmentation method comprises two steps: (i) WB-DWI atlas-based segmentation for whole-liver delineation, and (ii) a ML model for subsequent lesion segmentation.
Data were acquired from three 1.5T MRI scanners (MAGNETOM Aera/Avanto/Sola, Siemens Healthcare, Erlangen, Germany) at a single institution between 2014-2021, all protocols included DWI acquired with b=50s/mm2 and 900s/mm2. A total of 25 patients were included: 15 patients with confirmed diffuse multiple myeloma were used as atlases with manually contoured liver regions; 6 patients (2 with metastatic melanoma and 4 with myeloma and extra-medullary disease, not included in the atlases) were used to test the whole liver segmentation; 8 patients were used to train ML models for lesion segmentation within the liver (the same 4 myeloma patients, 2 metastatic prostate cancer patients and 2 new metastatic melanoma patients). Besides the patient atlases, all patients had confirmed disease within the liver. Regions of healthy liver tissue, tumour, cysts, gall bladder, kidneys and bowel were manually identified by an imaging scientist with >2 years experience with WB-MRI5.
Atlas-based delineation of the whole liver was achieved using 3D affine followed by diffeomorphic demons registration of the moving patient dataset to each of 15 patient atlases. Affine alignment was used as an initialisation step prior to each deformable registration. A weighted majority voting mechanism was subsequently used to determine a probability mask for the whole liver; the optimum threshold for creating a binary mask was determined using a leave-one-out cross-validation (LOOCV) analysis based on the 15 myeloma patients. The mean Dice coefficient, precision and recall was evaluated using 6 test patients with disease within the liver6,7.
Using images acquired from 8 patients the total number of pixels sampled in each class was: healthy liver=27402; kidney=6100; bowel=4176; gall bladder=1852; tumour=1761; and cyst=528. Sampling bias was reduced by randomly selecting 2000 pixels per class and using replacement when the total number of pixels < 2000. LOOCV was performed using 3 machine learning models with 2 features (b=50s/mm2 image intensity and ADC map): Gaussian Naïve Bayes (NB), support vector classification (SVC, with C=100, RBF kernel) and random forest (RF, with maximum depth=5, number of trees=100)7. Cross validation was performed for binary classification (healthy liver and disease only) and multi-label classification. The mean one-versus-many F-1 measure was calculated for each model and class and the model was then applied within the liver mask on each test patient.Results
The LOOCV indicated a threshold of 0.395 for generating whole-liver masks. Figure 1 compares the manual to the automatic segmentation of the liver. The mean and standard deviation of the Dice coefficient, precision and recall of the liver delineations using the derived threshold in the test patients are 0.73±0.07, 0.81±0.15 and 0.69±0.13 respectively.
Table 1 displays the mean one-versus-many F-1 measure per class in each model following LOOCV. Using the RF model, the tissue inside the liver mask was segmented. The results of the classification are shown in Figure 2 in one slice. Segmentation A is the binary mask (disease vs non-disease), and segmentation B is the multi-label classification.Discussion
Liver delineation using deformable registration on WB-DWI generates a good quality segmentation and can provide a suitable starting point to perform lesion segmentation within the liver. In future, the threshold applied to the liver probability map could be tuned to optimise for lesion segmentation in addition to whole liver delineation. This may help decrease the proportion of disease excluded from the whole liver in cases of under-segmentation.
The performance of the three ML models for classification was largely comparable; however, the RF model was marginally better at classifying disease. Therefore, the RF model was chosen for testing lesion segmentation in the whole liver. The binary model correctly identified most sites of disease; however, it had a high rate of false positives due to other structures (kidney, cysts, gall bladder) displaying similar properties to disease. A multi-label model was introduced to reduce the misclassified regions. The improved segmentation could then be adjusted by a radiologist to remove remaining false positives.Conclusion
Liver delineation using deformable registration, followed by lesion segmentation using ML methods on WB-DWI shows promise as a tool for the delineation of liver lesions. Considering the small sample size used to train the models and minimal optimisation of hyperparameters, the segmentation is encouraging and merits further investigation using a larger labelled data set.Acknowledgements
We acknowledge CRUK and EPSRC support to the Cancer Imaging Centre at ICR and RMH in association with MRC and Department of Health C1060/A10334, C1060/A16464, and NHS funding to the National Institute for Health Research (NIHR) Biomedical Research Centre and the NIHR Royal Marsden Clinical Research Facility in Imaging. This report is independent research funded by the NIHR Biomedical Research Centre, the Clinical Research Facilities at The Royal Marsden NHS Foundation Trust and the Institute of Cancer Research, London, United Kingdom. The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health.References
1. de Ridder, J. et al. Incidence and origin of histologically confirmed liver metastases: an explorative case-study of 23,154 patients. Oncotarget 7, 55368 (2016).
2. Rajkumar, S. V. et al. International Myeloma Working Group updated criteria for the diagnosis of multiple myeloma. The Lancet Oncology vol. 15 e538–e548 (2014).
3. Wolchok, J. D. et al. Guidelines for the evaluation of immune therapy activity in solid tumors: Immune-related response criteria. Clin. Cancer Res. 15, 7412–7420 (2009).
4. NICE. Recommendations; Melanoma: assessment and management. https://www.nice.org.uk/guidance/ng14/chapter/1-Recommendations#staging-investigations-2.
5. Blackledge, M. D. et al. Supervised machine-learning enables segmentation and evaluation of heterogeneous post-treatment changes in multi-parametric mri of soft-tissue sarcoma. Front. Oncol. 9, 941 (2019).
6. Taha, A. A. & Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging 15, 1–28 (2015).
7. Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).