3799

Advancing prediction of bone marrow biopsy results from MRI in myeloma patients: A Neural Network Approach
Jessica Kächele1,2, Markus Wennmann3, Maximilian Fischer1,2,4, Robin Peretzke1,4, Tassilo Wald1,5, Juliane K. Bernhard1,6, Fabian Bauer3,7, Sandra Sauer8, Jens Hillengass9, Elias K. Mai8, Niels Weinhold8, Hartmut Goldschmidt10,11, Marc-Steffen Raab8, Heinz-Peter Schlemmer11, Stefan Delorme3, Klaus Maier-Hein1,11,12, and Peter Neher1,12,13
1German Cancer Research Center (DKFZ), Division of Medical Image Computing, Heidelberg, Germany, 2German Cancer Consortium (DKTK), DKFZ, core center, Heidelberg, Germany, 3German Cancer Research Center (DKFZ), Division of Radiology, Heidelberg, Germany, 4Medical Faculty, Heidelberg University, Heidelberg, Germany, 5Helmholtz Imaging, German Cancer Research Center (DKFZ), Heidelberg, Germany, 6Medical Faculty, University of Regensburg, Regensburg, Germany, 7Medical Faculty, University of Heidelberg, Heidelberg, Germany, 8Heidelberg Myeloma Center, Department of Medicine V, University Hospital Heidelberg, Heidelberg, Germany, 9Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States, 10Department of Medicine V, GMMG-Studygroup, University Hospital Heidelberg, Heidelberg, Germany, 11National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany, 12Pattern Analysis and Learning Group, Department of Radiation Oncology, University Hospital Heidelberg, Heidelberg, Germany, 13German Cancer Consortium (DKTK), DKFZ, core center, Heidelberg, Germany

Synopsis

Keywords: Diagnosis/Prediction, Machine Learning/Artificial Intelligence, Regression, CNN, Radiomics

Motivation: While Radiomics analysis has shown predictive power for plasma cell infiltration (PCI) from MRI in Myeloma patients, convolutional neural networks (CNNs) offer an opportunity for improved performance and generalizability.

Goal(s): Our objective was to develop a predictive model for PCI using CNNs while addressing the challenges posed by limited dataset size.

Approach: CNNs were trained on MRI data of the pelvic bone marrow and its predictive capabilities were enriched by concatenating radiomic features in the latent space.

Results: The findings revealed limitations due to the small dataset size. However, incorporating radiomic features enhanced prediction accuracy, aligning with radiomics and random forest-based methods.

Impact: This study highlights the limitations of deep learning when using a small dataset. It underlines the importance of feature extraction and the need of dedicating substantial efforts to create large annotated datasets.

Introduction

Precise evaluation of plasma cell infiltration (PCI) is essential for effective staging, risk, and response assessment in Multiple Myeloma and its precursor stages1,2,3,4. However, the current standard for PCI assessment, bone marrow biopsy, is higly invasive and has limitations in capturing the spatially heterogeneous nature of the tumor tissue5, 6.
Recent research has explored the potential of magnetic resonance imaging (MRI), alongside radiomics and random forests to predict bone marrow biopsy results as an alternative to address these limitations7. Despite the promise of this approach, it still offers room for improvement, due to remaining substantial error rates of individual predictions and limited generalizability on external data.
Prior studies have demonstrated that convolutional neural networks (CNNs) can outperform radiomics-based techniques, suggesting that the integration of deep learning into PCI assessment could significantly advance the field and improve prediction accuracy8,9.

Methods

T1-weighted whole-body MRI scans were used, comprising 168 and 59 samples per train and testset, respectively. Following preprocessing, as described elsewhere7, the images were cropped to the pelvis region, and the background was masked.
To predict the PCI several ResNet architectures were employed, chosen for their widespread use in medical imaging10. These architectures, varying in size, were explored to assess the impact of network dimensions. Additionally, several overfitting prevention measures were implemented, including augmentations, dropout, and transfer learning.
As a second experimental setup, radiomics features were explicitly incorporated into the CNN, to harness their predictive potential. This was achieved by concatenating the ten most significant features, determined by a random forest as described elsewhere7, into the network's latent space after the convolutional layers, as illustrated in Figure 1. To inspect the effect of these additional latent space features, we conducted a third experiment in which the PCI was predicted from only radiomics features using a linear layer.
The models were trained and fine-tuned via 5-fold cross-validation. Subsequently, they were evaluated on the independent hold-out testset, using the Pearson correlation coefficient and mean absolute error. To gain a deeper understanding and interpretation of the results, we relied on SHAP values that provide explanations for predictions by quantifying the impact of each feature on the prediction outcome11.

Results

Employing augmentations and reducing the network size to a ResNet6 delivered the best results when applying ResNet directly to the MRI data, although attempts to mitigate overfitting through various measures yielded limited improvements. In comparison to methods that utilize radiomics, the deep-learning-based approach showed inferior performance, as depicted in Table 1.
Incorporating radiomics features into the latent space improved PCI prediction, with results comparable to those obtained using radiomics in conjunction with a random forest. However, an examination of the SHAP11 values in Figure 2 revealed that while features from the convolutional part of the network had little effect, radiomic features were a significant contributor to the predictions. However, it becomes evident that there is a positive effect when the linear layer is exclusively trained on radiomics, as the outcome is worse than when integrating latent space features.

Discussion

We initially expected that employing a CNN would yield comparable or superior results to the utilization of radiomics and a random forest. However, it has become apparent that the CNN approaches for this task are not competitive with a radiomics model on the given dataset. The results emphasize overfitting as a primary challenge. One possible explanation for this is the extremely large input images, leading to a sizable model that ultimately predicts only a single value and has a relatively small number of training samples, making it difficult for the CNN to find predictive features or patterns in the data. Despite our use of a relatively extensive dataset, which surpasses the scale of other Multiple Myeloma datasets, it falls short of meeting the demands of deep learning approaches.
By incorporating a limited selection of radiomics features, the model can produce predictions of PCI values that are significantly correlated with actual PCI values from bone marrow biopsy. The analysis of SHAP values has revealed that only these features are crucial for the prediction. The latent space features obtained through the CNN may introduce noise, serving as a preventive measure against rapid overfitting. This emphasizes the significance of extracting a curated set of informative features to address the challenge of high dimensionality.

Conclusion

Despite studies demonstrating the potential of deep learning to outperform radiomics-based approaches, our findings highlight challenges in predicting PCI using CNNs due to the limited dataset size and large feature space. It becomes increasingly apparent that the challenge lies in dedicating substantial efforts to create large annotated datasets.

Acknowledgements


References

1. Lakshman A, Rajkumar SV, Buadi FK, et al. Risk stratification of smoldering multiple myeloma incorporating revised IMWG diagnostic criteria. Blood Cancer J. 2018;8(6):59.

2. Mateos MV, Kumar S, Dimopoulos MA, et al. International Myeloma Working Group risk stratification model for smoldering multiple myeloma (SMM). Blood Cancer J. 2020;10(10):102.

3. Kumar S, Paiva B, Anderson KC, et al. International Myeloma Working Group consensus criteria for response and minimal residual disease assessment in multiple myeloma. Lancet Oncol. 2016;17(8):e328-e346.

4. Rajkumar SV, Dimopoulos MA, Palumbo A, et al. International Myeloma Working Group updated criteria for the diagnosis of multiple myeloma. Lancet Oncol. 2014;15(12):e538-e548.

5. Hillengass J, Ellert E, Spira D, et al. Comparison of plasma cell infiltration in random samples of the bone marrow and osteolyses acquired by CT-guided biopsy in patients with symptomatic multiple myeloma. Journal of Clinical Oncology. 2016;34(15_suppl):8040-8040.

6. Latifoltojar A, Boyd K, Riddell A, Kaiser M, Messiou C. Characterising spatial heterogeneity of multiple myeloma in high resolution by whole body magnetic resonance imaging: Towards macro-phenotype driven patient management. Magn Reson Imaging. 2021;75:60-64.

7. Wennmann M, Ming W, Bauer F, et al. Prediction of Bone Marrow Biopsy Results From MRI in Multiple Myeloma Patients Using Deep Learning and Radiomics. Invest Radiol. 2023;58(10):754-765.

8. Sun Q, Lin X, Zhao Y, et al. Deep Learning vs. Radiomics for Predicting Axillary Lymph Node Metastasis of Breast Cancer Using Ultrasound Images: Don't Forget the Peritumoral Region. Front Oncol. 2020;10:53.

9. Truhn D, Schrading S, Haarburger C, Schneider H, Merhof D, Kuhl C. Radiomic versus Convolutional Neural Networks Analysis for Classification of Contrast-enhancing Lesions at Multiparametric Breast MRI. Radiology. 2019;290(2):290-297.

10. Mazurowski MA, Buda M, Saha A, Bashir MR. Deep learning in radiology: An overview of the concepts and a survey of the state of the art with focus on MRI. J Magn Reson Imaging. 2019;49(4):939-954.

11. Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Curran Associates Inc.; 2017:4768-4777.

Figures

Figure 1: Schematic representation of the combined CNN and Radiomics architecture.

Table 1: Comparative results from various models, comparing predicted and actual PCI values on the test set, with all Pearson correlation coefficients exhibiting P values <0.001.


Figure 2: SHAP11 values of the top 20 crucial latent space features in the combined CNN and Radiomics Model, obtained through considering all possible feature subsets and their corresponding predictions.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)
3799
DOI: https://doi.org/10.58530/2024/3799