GPT-5 remains far from being ready for clinical application or standalone diagnostic use, according to new data from an international team.
In a letter published in European Radiology on 26 January, a group led by Dr. Dana Brin of the Department of Diagnostic Imaging at Chaim Sheba Medical Center, in Tel Hashomer, Ramat Gan, Israel, used data from an analysis they had published in European Radiology in August 2024 on the diagnostic accuracy of GPT-4V in imaging (CT, ultrasound, and x-ray) to assess the updated multimodal model, GPT-5.
CT image of the head with a right temporal parenchymal hemorrhage, from the previous analysis using GPT-4V. GPT-4V correctly identified the image modality, anatomical region, and pathology of “intracranial hemorrhage”.Brin et al; European Radiology
Results from the previous study showed that GPT-4V had performed well in identifying modalities but only performed moderately well in anatomical localization and pathology identification. Furthermore, GPT-4V produced frequent diagnostic hallucinations. The authors concluded that the software held promise but could not be used as a standalone.
Using the same prompts as they had with the previous iteration, the authors analyzed the same dataset of 230 exams (103 CT, 74 ultrasound, 53 x-ray) using GPT-5. Again, they assessed modality identification, anatomical localization, and pathology identification. Additionally, they classified errors as omissions and hallucinations as they had in the previous assessment.
For the 230 exams, GPT-4V had demonstrated 100% modality identification; anatomical localization was 87.1% accurate, and pathology-level identification was 35.2%. The overall hallucination rate was 46.8%.
Ultrasound image of the right kidney demonstrating hydronephrosis from the previous analysis. GPT-4V identified the modality but misidentified the anatomical region as “pelvis” and the pathology as “cholelithiasis”.Brin et al; European Radiology
In comparison, GPT-5 also demonstrated 100% accuracy in identifying the modality used. Its performance was significantly better than that of GPT-4V with regard to anatomical localization: 97.8% overall; 100% for CT, 98.1% for x-ray, and 94.6% for ultrasound. Improvements in pathology identification were less pronounced, at 40% overall.
The authors added that the improvement in pathology accuracy was largely seen in ultrasound, with the rate rising from 9.1% to 33.8%. Pathology showed a more modest improvement for x-ray (67.9% for GPT-5 vs. GPT-4V’s 66.7%.) However, the accuracy rate for pathology-level identification for CT was lower for GPT-5 than for GPT-4V: It had decreased from 36.4% to 30.1%.
Even with the improvements in anatomical and pathology identification, the authors determined that GPT-5 exhibited the same shortcomings with errors as GPT-4V. In fact, the rate of hallucinations had worsened: 60% of the total number of exams with the updated software now had at least one hallucinated finding, while that percentage had been 46.8% with the previous version. The percentages by modality were all higher -- 73.8% for CT (GPT-4V: 51.5%), 26.4% for x-ray (GPT-4V: 19.6%), and 64.9% for ultrasound (GPT-4V: 60.6%).
The omission rate for the new version was 22.2% overall, up from 16.2% (24.3% for CT, 28.3% for x-ray, and 14.9% for ultrasound with GPT-5; GPT-4V’s corresponding rates for each modality were 18.9%, 24.1%, and 3.4%, respectively). A high rate of errors -- either hallucinations or omissions -- increases the chances of missed diagnoses and misdiagnoses, increasing the need for caution in using this software.
While acknowledging the promising results achieved with the previous iteration, the authors previously found that “[s]uch inaccuracies highlight that GPT-4V is not yet suitable for use as a standalone diagnostic tool.”
Following their analysis of the update, they concluded that “GPT-5 remains far from being ready for clinical application or standalone diagnostic use and underscores the need to continue performing careful, model- and version-specific evaluations before considering any integration into routine practice.”
Read the 26 January letter here. Go to the original 2024 paper here.


















