Language Models: Help or Hindrance?

Ito Rintaro¹
¹Innovative BioMedical Visualization, Nagoya University Graduate School of Medicine, Nagoya, Japan

Synopsis

Keywords: Transferable skills: Software engineering

Large Language Models (LLMs) are increasingly utilized in diagnostic imaging, offering capabilities in processing text and images, and handling composite modalities. They assist in creating accurate diagnostic reports, interpreting images, and integrating clinical information. However, challenges include ensuring output reliability, addressing biases, and tackling ethical and privacy concerns. Future improvements involve continuous training, ethical standards, interdisciplinary collaboration, and regulatory validation. These steps aim to enhance LLMs' efficiency and accuracy in diagnostic imaging while addressing concerns about reproducibility, prompt handling, and ethical considerations.

Introduction to LLMs

Large Language Models (LLMs) have emerged as powerful tools in various fields, including medicine and diagnostic imaging (1, 2). Their ability to process and generate text, handle image information, and work with composite modalities makes them particularly useful for creating diagnostic imaging reports and decision-making processes. However, their application in healthcare, especially in diagnostic imaging, presents both significant potential and notable challenges.

LLMs in Diagnostic Imaging

Textual Information Handling in Diagnostic Reports
LLMs can generate, summarize, and translate textual information with near-human accuracy. This capability is beneficial in writing diagnostic imaging reports, where precise and clear communication is crucial (3). For instance, LLMs can assist in drafting initial versions of radiology reports based on the radiologist's notes or directly from image findings, potentially increasing efficiency and reducing the time radiologists spend on report writing (4, 5). LLMs can convert diagnostic imaging reports written for medical professionals into simpler expressions, making them easier for patients to understand (6).

Appropriate Selection of Imaging Tests
LLMs significantly contribute to selecting appropriate imaging tests in diagnostic imaging (7–9). By analyzing patient histories and symptoms, LLMs recommend suitable imaging modalities, enhancing diagnostic accuracy and efficiency. They aid in optimizing imaging resource usage and reducing unnecessary tests, which can lower healthcare costs. However, it is crucial to balance LLM recommendations with expert human judgment to align with current medical practices and ensure patient safety.

Image Information Processing
LLMs, especially those trained or integrated with image recognition capabilities, can assist in interpreting diagnostic images (10). They can identify patterns and suggest possible diagnoses to radiologists. This application is particularly promising in fields like radiology, where the volume of imaging studies is high, and there's a constant need for timely and accurate interpretations.

Composite Modality Handling
The integration of textual and image information represents a composite modality where LLMs can offer significant advantages. By analyzing both written clinical information and diagnostic images, LLMs can provide a more comprehensive assessment. This dual capability can enhance diagnostic precision and support more nuanced healthcare delivery (11, 12).

Challenges and Shortcomings of LLMs in Diagnostic Imaging

Evaluation of LLMs
Various methods exist for evaluating LLMs. Until recently, metrics like BLEU, ROUGE, and METEOR, focusing on lexical similarity between generated and reference texts, were common. Evaluating accuracy through benchmark tests conducted with a list of questions is also prevalent (13). Now, the focus has shifted to context-aware and semantic evaluation methods. Human evaluation plays a significant role in assessing consistency, relevance, and creativity. There are also methods employing LLMs to evaluate semantic similarity.

Reproducibility and Reliability
One of the main challenges with LLMs is ensuring the reproducibility and reliability of their outputs (14). Since LLMs are trained on vast datasets, there's a risk of incorporating biases or inaccuracies present in the training data into their outputs.

Handling of Prompts and Knowledge Limitations
The effectiveness of LLMs heavily depends on the quality of the prompts given to them. Poorly structured prompts can lead to inaccurate or irrelevant outputs. Moreover, LLMs' knowledge is limited to the data they were trained on, which may not include the latest research or rare conditions (15).

Ethical and Privacy Concerns
The use of LLMs in healthcare raises ethical and privacy concerns, particularly regarding patient data confidentiality. Ensuring that LLMs operate within strict ethical guidelines and legal frameworks is crucial to maintaining patient trust and safeguarding sensitive information (16).

Future Directions and How to Address Challenges

To maximize the benefits of LLMs in diagnostic imaging while mitigating their shortcomings, several steps can be taken.

Continuous Training and Updating
LLMs should be regularly updated with the latest medical research and clinical guidelines to ensure their recommendations remain accurate and relevant (17). However, when applying for regulatory approval, it is common to fix the performance as it is at the time. It is natural that medical devices with fluctuating accuracy are not permitted for safety reasons. There is a need to facilitate management by authorities while performing updates over time. Therefore, temporal evaluation is necessary in AI software, including LLMs.

Ethical and Privacy Safeguards
Implementing robust privacy-preserving techniques and ethical guidelines is essential to protect patient data and ensure the responsible use of LLMs (16, 18). There must be systems and mechanisms in place to prevent inappropriate expressions and adhere to these guidelines. For privacy protection, online and local LLMs that comply with appropriate guidelines are desirable.

Interdisciplinary Collaboration
The assessment and implementation of AI can no longer be conducted by a single clinician or a small group of people. It is necessary to collaborate across multiple professions, including AI experts, radiologists, and ethicists, to implement AI and conduct its ongoing evaluation. Achieving these will help in designing LLM applications that are both technically sound and ethically responsible[7].

Conclusion

LLMs present a groundbreaking opportunity to advance the field of diagnostic imaging. They excel in processing and generating both textual and image data, offering significant improvements in efficiency and diagnostic accuracy. Yet, harnessing their full potential demands meticulous attention to the challenges they pose, including issues of reproducibility, prompt handling, and ethical considerations. With ongoing enhancements, collaborative interdisciplinary efforts, and rigorous adherence to ethical and privacy standards, LLMs stand poised to make a substantial impact in diagnostic imaging, reshaping the landscape of healthcare technology.

Acknowledgements

This review was used with the assistance of ChatGPT.

References

1. McIlvain G, Oechtering TH, Shammi UA, et al.: Chatbots for Literature Review and Research-Insights from a Panel Discussion at the Annual Meeting of the International Society of Magnetic Resonance in Medicine (ISMRM) 2023. J Magn Reson Imaging 2023.

2. Ufuk F: The Role and Limitations of Large Language Models Such as ChatGPT in Clinical Settings and Medical Journalism. Radiology 2023; 307:e230276.

3. Clusmann J, Kolbinger FR, Muti HS, et al.: The future landscape of large language models in medicine. Commun Med 2023; 3:141.

4. Elkassem AA, Smith AD: Potential Use Cases for ChatGPT in Radiology Reporting. AJR Am J Roentgenol 2023; 221:373–376.

5. Bhayana R: Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology 2024; 310:e232756.

6. Amin KS, Davis MA, Doshi R, Haims AH, Khosla P, Forman HP: Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports. Radiology 2023; 309:e232561.

7. Zaki HA, Aoun A, Munshi S, Abdel-Megid H, Nazario-Johnson L, Ho Ahn S: The Application of LLMs for Radiologic Decision-Making. J Am Coll Radiol 2024.

8. Gertz RJ, Bunck AC, Lennartz S, et al.: GPT-4 for Automated Determination of Radiological Study and Protocol based on Radiology Request Forms: A Feasibility Study. Radiology 2023; 307:e230877.

9. Rau A, Rau S, Zoeller D, et al.: A Context-based Chatbot Surpasses Trained Radiologists and Generic ChatGPT in Following the ACR Appropriateness Guidelines. Radiology 2023; 308:e230970.

10. Kottlors J, Bratke G, Rauen P, et al.: Feasibility of Differential Diagnosis Based on Imaging Patterns Using a Large Language Model. Radiology 2023; 308:e231167.

11. Yang J, Li HB, Wei D: The impact of ChatGPT and LLMs on medical imaging stakeholders: Perspectives and use cases. Meta-Radiology 2023; 1:100007.

12. Hyland SL, Bannur S, Bouzid K, et al.: MAIRA-1: A specialised large multimodal model for radiology report generation. arXiv [csCL] 2023.

13. Singhal K, Azizi S, Tu T, et al.: Large language models encode clinical knowledge. Nature 2023; 620:172–180.

14. López-Úbeda P, Martín-Noguerol T, Luna A: Radiology in the era of large language models: the near and the dark side of the moon. Eur Radiol 2023; 33:9455–9457.

15. Alberts IL, Mercolli L, Pyka T, et al.: Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be? Eur J Nucl Med Mol Imaging 2023:1549–1552.

16. Mukherjee P, Hou B, Lanfredi RB, Summers RM: Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports. Radiology 2023; 309:e231147.

17. Brady AP, Allen B, Chong J, et al.: Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement from the ACR, CAR, ESR, RANZCR and RSNA. Radiol Artif Intell 2024; 6:e230513.

18. Ueda D, Kakinuma T, Fujita S, et al.: Fairness of artificial intelligence in healthcare: review and recommendations. Jpn J Radiol 2024; 42:3–15.

Proc. Intl. Soc. Mag. Reson. Med. 32 (2024)