3800

Towards a Clinical Decision-Support System for Automating MRI Protocoling

Peyman Shokrollahi¹, Juan M Zambrano¹, Allison Li², Surbhi Raichandani¹, Akshay S. Chaudhari¹, and Andreas M. Loening¹
¹Stanford University, Stanford, CA, United States, ²GE Healthcare, Sunnyvale, CA, United States

Synopsis

Keywords: Other AI/ML, Machine Learning/Artificial Intelligence, Radiology Protocols, Decision Support System, Modeling, All-Body MR Protocols

Motivation: We developed a system that performs radiology protocol selection for incoming MRI orders.

Goal(s): To enhance MRI protocol selection accuracy and efficiency. We evaluated new models and expanded anatomic/subspeciality coverage compared to a prior body MRI protocol selection system.

Approach: A machine learning-driven decision-support system was developed integrating kernel-based, tree-based, boosting, and deep-learning algorithms with an ensemble classifier in 22,524 patients. This system utilizes electronic medical records to predict the top-three likely MRI protocols and their probabilities.

Results: A cumulative F1-score of 97.1% for the top-three predicted MRI protocols was obtained in a test set of 3,379 patients.

Impact: The proposed system has the potential to improve radiologists’ protocol selection accuracy by notifying them of protocol-case discrepancies due to the individual patient’s conditions, and to enable a decision-support system for greater efficiency in selecting commonly utilized MR protocols.

Introduction

Nearly 40 million MRI scans are conducted annually in the USA¹. Most research on machine learning (ML) applications has centered on image processing tasks, such as detection², segmentation³, and reconstruction⁴, rather than pre-image acquisition tasks⁵. When selecting a protocol in response to a physician order, a radiologist specifies a protocol (e.g., MR pelvis prostate carcinoma without and with contrast) that most commonly encompasses a broad anatomical region (e.g., pelvis), a specific organ target (e.g., prostate), a particular purpose (e.g., cancer screening), and whether contrast will be utilized (e.g., without and with contrast)⁶. In addition to its susceptibility to human error, this tedious process⁷ takes radiologist time away from clinical image interpretation. Consequently, protocol selection is sometimes delegated to technologists⁸. Importantly, use of an improper protocol may yield insufficient diagnostic data, putting patient health at risk, delaying treatment, and increasing healthcare costs^8,9. ML could increase efficiency in radiology workflows by facilitating appropriate protocol selection. Unlike prior ML systems that utilize free-text inputs^9,10, our system uses structured data from the electronic medical record (EMR). Large language models (LLMs) have been recently used in protocol selection^5,11. However, LLMs demonstrate challenges, including inaccuracy, uncertainty, and data-privacy issues^12,13. Prior work has presented the formulation of EMR database-trained modeling systems designed to predict rank-ordered Body MR protocols and avoid the abovementioned challenges¹⁴. Herein, we extend this modeling algorithm approach to handle a greater diversity and anatomical scope of applications in a streamlined ensemble system.

Methods

All work was performed with an institutional review board approved consent waiver, using retrospective, anonymized data. EMRs were obtained from patients undergoing cardiovascular, cardiac, body, breast, and neuroradiology MRI scans at our institution between May 2017 and December 2022. A tabular dataset was generated, including radiology-specific data (e.g., protocol forms, worklist, history, allergies) and general data (e.g., demographics, laboratory, and orders). Initial attribute selection incorporated factors such as ordered procedure, order priority, allergy information, and previously applied protocols. Data corresponding to the top 10 used protocols (Table 1) were extracted from EMRs for 22,524 patients, encompassing 31,380 radiology examinations and forming a tabular input signal with a feature space of 156 attributes per record. We omitted records that were duplicates, had altered orders, were for other imaging modalities, were MSK examinations, or were for individuals not 25‒85 years of age. The dataset was split into training (75%) and testing (25%) subsets by patients. We used support vector machine (SVM), random forest (RF), light gradient boosting machine (LGBM), extreme gradient boosting (XGBoost), and neural network (NN) algorithms for performing protocol classification^9,15-17. The NN model consists of four layers with batch normalization using ReLU activation function. An ensemble classifier integrated all models, excluding SVM (due to high computational time) by averaging classifiers probabilistic predictions. Hyperparameters (e.g., learning rate, leaf count, and bandwidth) were fine-tuned with a Bayesian approach¹⁸. A five-fold cross-validation was employed to assess model efficacy, gauged by F1-score (a metric suitable for evaluating imbalanced data). Shapley additive explanations (SHAP) values were plotted to reveal the relative impacts of each feature on predictions. The tuned models were used to predict protocols and their probabilities for an unseen test dataset of 3,379 patients with 4,771 records. The pipeline lists the top three protocol suggestions and their probabilities. The accuracy of these selections was evaluated based on F1-scores.

Results and Discussion

We obtained an average F1-score of 89.7% per our cross-validation evaluations across all models (Fig. 1). SHAP plots revealed that Ordered Procedure and Ordered Anatomical Region were important features in most models (Fig. 2). The trio of the most probable predicted protocols from the ensemble classifier can be earmarked for radiologist review (Fig. 3). An accumulated F1-score of 97.1% was obtained for the top-three predicted protocols (Fig. 4). The most prominent protocols, and their associated probabilities, aid in selecting the most appropriate protocol as part of a clinical decision-support system.

Conclusions

The presently tested protocol selection system, which provides expanded modeling and anatomical scan target coverage compared to a prior system¹⁴, has been validated with real clinical data. Incorporating a variety of classifiers improved its predictive accuracy and stability. The proposed pipeline represents a new approach to optimizing radiology protocol selection by providing automated suggestion of common protocols. We are in the process of incorporating this resultant decision-support system into our radiologist’s workflow hope to assess whether it can contribute to improved patient outcomes while improving radiology efficiency.

Acknowledgements

This work has been supported and funded by General Electric (GE) Healthcare.

References

1. The Organization for Economic Cooperation and Development. MRI units per million: by country, 2022. USA, https://data.oecd.org/healtheqt/magnetic-resonance-imaging-mri-units.htm, Accessed November 1, 2023.

2. Sheth, D, Giger, M. Artificial intelligence in the interpretation of breast cancer on MRI. J Magn Reson Imaging. 2020;51(5):1310-1324.

3. Goldenberg, S, Nir, G, Salcudean, S. A new era: artificial intelligence and machine learning in prostate cancer. Nat Rev Urol. 2019;16(7):391-403.

4. Wang, G, Ye, J, De Man, B. Deep learning for tomographic image reconstruction. Nat Mach Intell. 2020;2(12):737-748.

5. Gertz, R, Bunck, A, Lennartz, S, et al. GPT-4 for automated determination of radiological study and protocol based on radiology request forms: a feasibility study. J Radiol. 2023;307(5):230877.

6. Boland, G, Duszak, R. Protocol management and design: current and future best practices. J Am Coll Radiol. 2015;12(8):833-835.

7. Richardson, M, Garwood, E, Lee, Y, et al. Noninterpretive uses of artificial intelligence in radiology. Acad Radiol. 2021;28(9):1225-1235.

8. Kalra A, Chakraborty A, Fine B, et al. Machine learning for automation of radiology protocols for quality and efficiency improvement. J Am Coll Radiol. 2020;17(9):1149-1158.

9. Brown A, Marotta T. A natural language processing-based model to automate MRI brain protocol selection and prioritization. Acad Radiol. 2017;24(2):160-166.

10. Trivedi H, Mesterhazy J, Laguna B, et al. Automatic determination of the need for intravenous contrast in musculoskeletal MRI examinations using IBM Watson’s natural language processing algorithm. J Digit Imaging. 2018;31(2):245-251.

11. Mese, I, Taslicay, C, Sivrioglu, A. Improving radiology workflow using ChatGPT and artificial intelligence. Clin Imaging, 2023;109993.

12. Thirunavukarasu, A, Ting, D, Elangovan, K, et al. Large language models in medicine. Nat. Med. 2023;29(8):1930-1940.

13. Clusmann, J, Kolbinger, F, Muti, H, et al. The future landscape of large language models in medicine. Commun. Med. 2023;3(1):141.

14. Shokrollahi, P Zambrano, J, et al. Predicting Abdominal MRI Protocols using Electronic Health Records. In Proc. Int. Soc. Magn. Reson. Med., Toronto, Canada, Jun. 2023.

15. Retson, T, Besser, A, Sall, S, et al. Machine learning and deep neural networks in thoracic and cardiovascular imaging. J. Thorac. Imaging. 2019;34(3):192.

16. Charbuty, B, Abdulazeez, A. Classification based on decision tree algorithm for machine learning. J. appl. sci. technol. trends. 2021;2(01):20-28.

17. LeCun, Y, Bengio, Y, and Hinton, G. Deep learning. Nat. 2015;521(7553):436-444.

18. Snoek, J, Larochelle, H, Adams, R. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 2012;25.

Figures

Table 1 Counts of protocols use. These protocols represent approximately 55% of all protocols after preprocessing our dataset.

Fig. 1 Mean F1-scores for protocol prediction following 5-fold cross-validation. The F1-scores were within a range indicating the system’s ability to predict new data, avoid overfitting, and generalize to an independent dataset. Cross-validation was applied to the training and validation sets (75% of the data) while the test set was left intact for protocol prediction. The ensemble classifier consisted of decision-tree and boosting based models.

Fig. 2 Random Forest model SHAP plot illustrating the contribution of each feature to predictions. The plot indicates that Ordered Procedure and Ordered Anatomical Region are the most significant features. (Abd.: Abdomen, Ext.: Extracellular, Cont.: Contrast, Deg.: Degenerative, Carc.: Carcinoma, IAC: Internal Auditory Canal, Tem.: Temporal)

Fig. 3 Four examples of algorithm output. In each example, three suggested protocols are given with their model-produced probabilities. Red boxes indicate appropriately selected protocols and their corresponding ground truth labels. The bottom two rows indicate whether the selected protocol was among the top-1 or top-3 suggested protocols.

Fig. 4 F1-scores for the three top selected MRI protocols. Results demonstrate good performance, as indicated by F1-scores, of the proposed clinical decision-support system.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)

3800

DOI: https://doi.org/10.58530/2024/3800