the health strategist
multidisciplinary institute
Joaquim Cardoso MSc.
Chief Research and Strategy Officer (CRSO),
Chief Editor and Senior Advisor
August 22, 2023
What is the message?
The research letter “Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis”, published by JAMA Network, highlights the application of the GPT-4 program, an artificial intelligence tool, in analyzing medical records of older patients with delayed diagnoses.
The study reveals that GPT-4’s diagnostic accuracy in suggesting primary and differential diagnoses is notably higher than that of clinicians and a diagnostic decision support system.
While GPT-4 shows promise in aiding clinicians and improving diagnostic outcomes, its effectiveness depends on comprehensive patient information and careful interpretation within the clinical context.
Key takeaways:
- What is the main focus of the article?
The article discusses the use of the GPT-4 (Generative Pre-trained Transformer 4) program, a form of artificial intelligence (AI), to analyze medical records of older patients with delayed diagnosis and aims to determine whether GPT-4 can enhance diagnostic accuracy in complex cases.
- What is the hypothesis of the study?
The study hypothesizes that GPT-4 can improve the diagnostic accuracy of clinicians by providing the most probable diagnosis or suggesting differential diagnoses in complex cases, especially for patients in low-income countries where specialist care might be lacking.
- How was the study conducted?
The medical histories of six patients aged 65 years or older, who had experienced a delay of more than a month in receiving a definitive diagnosis, were entered into GPT-4 without revealing the actual diagnosis. The responses generated by GPT-4, as well as those by clinicians and a diagnostic decision support system, were collected and compared. The study analyzed the accuracy of primary diagnoses and differential diagnoses provided by GPT-4 and clinicians.
- What were the key findings of the study?
The accuracy of primary diagnoses made by GPT-4 was higher (66.7%) compared to clinicians (33.3%) and a diagnostic decision support system (0%). When including differential diagnoses, GPT-4’s accuracy was 83.3%, clinicians’ accuracy was 50.0%, and the decision support system’s accuracy was 33.3%. GPT-4 was found to suggest diagnoses not previously considered by clinicians, potentially leading to improved diagnostic outcomes.
- What are the implications and limitations of the study?
The study suggests that GPT-4 has the potential to aid clinicians in diagnosing older patients with complex cases, particularly where specialist care is limited. However, GPT-4’s effectiveness relies on comprehensive entry of patient information. The study acknowledges limitations, such as GPT-4’s limitations in detecting multifocal infections and the need for clinical context in interpreting suggestions. The use of AI in diagnosis is considered both promising and challenging based on the study’s findings.
DEEP DIVE
Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis [excerpt]
JAMA Network
Yat-Fung Shea, MBBS1; Cynthia Min Yao Lee, MBBS1; Whitney Chin Tung Ip, MBBS, BSc1; et alDik Wai Anderson Luk, MBBS, MRes1; Stephanie Sze Wing Wong, MBChB1
August 14, 2023
Introduction
Artificial intelligence (AI), especially machine learning, has been increasingly used in diagnosing conditions such as skin or breast cancer and Alzheimer disease. However, AI relies on clinical imaging.1 In low-income countries, where specialist care may be lacking, AI may be useful for making clinical diagnoses. The GPT-4 (Generative Pre-trained Transformer 4) program allows analysis of clinical history in daily practice.2 We hypothesized that GPT-4 could improve the diagnostic accuracy of clinicians by supplying the most probable diagnosis or suggesting differential diagnoses in complex cases.
Methods
The medical histories of 6 patients from the Division of Geriatrics in the Department of Medicine at Queen Mary Hospital who were aged 65 years or older and had delay of definitive diagnosis longer than 1 month in 2022 were retrieved after resolution.3–5 The full medical histories were entered chronologically on April 16, 2023 (at admission, 1 week after admission, and before final diagnosis) into GPT-4 (powered by OpenAI via Platform for Open Exploration) without information about definitive diagnosis. The GPT-4 responses were copied out and further analyzed (eMethods in Supplement 1). One patient has been described previously.6 Responses by GPT-4 and clinicians were collected and compared. Differential diagnoses were also generated using a medical diagnostic decision support systemIsabel DDx Companion; Isabel Healthcare). The study was approved by the Institutional Review Board of the University of Hong Kong and Hospital Authority Hong Kong West Cluster. Written consent was provided for all patients. This report followed the reporting guideline for case series studies.
Results
Six patients 65 years or older (2 women and 4 men) were included in the analysis. The accuracy of the primary diagnoses made by GPT-4, clinicians, and Isabel DDx Companion was 4 of 6 patients (66.7%), 2 of 6 patients (33.3%), and 0 patients, respectively. If including differential diagnoses, the accuracy was 5 of 6 (83.3%) for GPT-4, 3 of 6 (50.0%) for clinicians, and 2 of 6 (33.3%) for Isabel DDx Companion (Table). By studying the changes in GPT-4’s responses, we determined that certain key words were required to make an appropriate clinical response, including abdominal aortic aneurysm (patient 1), proximal stiffness (patient 2), acid-fast bacilli in urine (patient 3), metronidazole (patient 4), and retroperitoneal lymphadenopathy (patient 6). GPT-4 could suggest diagnoses not considered by clinicians before definitive investigations: mycotic aneurysm for patient 1 after computed tomography showing an abdominal aortic aneurysm; a drug cause of seizure in patient 5; and the presence of necrotic lymph nodes from a previous computed tomographic scan, which should have led to the diagnosis of lymphoma, in patient 6.
Discussion
Overall, GPT-4 has potential clinical use in older patients without a definitive clinical diagnosis after 1 month but requires comprehensive entry of demographic and clinical (including radiological and pharmacological) information. GPT-4 may increase confidence in diagnosis and earlier commencement of appropriate treatment, alert clinicians missing important diagnoses, and offer suggestions similar to specialists to achieve the correct clinical diagnosis, which has potential value in low-income countries with lack of specialist care. Clinicians need to be aware that GPT-4 is limited in multifocal infection, and the suggested management plan should be correlated with clinical context, as suggestions may be redundant. Clinicians should consider a drug review and review the possible diagnosis of malignant disease if suggested.
This study has several limitations. First, GPT-4 may not detect 2 focuses of infection or pinpoint the source of recurrent infection. Second, GPT-4 did not suggest the use of gallium scan or 18-fluorodeoxyglucose positron emission tomography to look for infections or malignant neoplasms in all but 1 patient. Third, some investigations may not be appropriate (eg, temporal artery biopsy in the absence of typical symptoms of giant cell arteritis). Overall, our findings suggest that the use of AI in diagnosis is both promising and challenging.
Article Information
See the original publication (this is an excerpt version)
References
See the original publication (this is an excerpt version)
Authors and Affiliations
Yat-Fung Shea, MBBS1; Cynthia Min Yao Lee, MBBS1; Whitney Chin Tung Ip, MBBS, BSc1; et alDik Wai Anderson Luk, MBBS, MRes1; Stephanie Sze Wing Wong, MBChB1
1Department of Medicine, Queen Mary Hospital, University of Hong Kong, Hong Kong
Originally published at https://jamanetwork.com