A postdeployment evaluation of an AI decision-support tool for emergency x-ray use showed it to be accurate but still in need of the judgment of a professional, according to a study published 16 March in the European Journal of Radiology (EJR).
While AI has demonstrated its utility in radiology and continues to improve, there is still reason for a cautious approach in relying on it for decision-making rather than decision support, and monitoring its performance is essential, wrote a team led by Kjetil Gundro Brurberg, PhD, of Vestre Viken Hospital Trust in Norway.
“Radiographers may play a crucial role in mitigating erroneous clinical decisions due to false AI results and ensure appropriate patient management,” the group noted.
The study assessed data from emergency x-ray use of the BoneView AI tool at Bærum Hospital in Gjettum, comparing findings by AI-assisted radiographers and AI alone to assess both the accuracy of the AI and the subsequent effect its findings had on patient management. The study included data from 1,052 AI-assisted x-ray exams taken in January 2024.
AI flagged 24 (2%) cases that proved to be false negatives and 77 (6%) that proved to be false positives. However, the authors noted that the tool labeled 57% of these false-positives “doubtful.”
The researchers found that overall sensitivity and specificity of the AI was 95% and 90%, respectively. When they excluded the false-positive cases the AI had labeled “doubtful,” sensitivity did not change, although specificity increased to 96%.
Radiographers manually overrode the AI findings in 204 (19%) of the 1,052 compared cases. While these overrides occurred in instances in which AI was correct as well as when it was incorrect, they were much more likely to occur in cases where the AI was wrong: 28 out of 85 incorrect AI assessments (33%) were overridden compared to 176 of 965 correct AI assessments (18%).
Brurberg and colleagues also found that 18% of AI’s false positives and 50% of its false negatives were for knee and lower leg x-rays, despite these exams constituting only 12% of the total. Most of the false-positive findings (73%) were due to incorrect recognition of fractures, they noted, but 21 false-positive findings were identified as bone lesions (15) and dislocations (5), and there was one false-positive effusion finding.
While false negatives were rare apart from the knee and lower leg findings, one finger fracture requiring treatment was missed by AI, the authors wrote, deeming it the only false negative by AI to have potentially resulted in delayed treatment and clinical consequences.
The study findings suggested that while AI by itself demonstrated high sensitivity and specificity, it lagged behind radiographers using AI. Furthermore, when the radiographers manually overrode the initial AI finding, AI was more likely to be incorrect; overrides by radiographers led to changes in almost 20% of treatment pathways.
The authors described the approach of using AI in decision-making as “collaborative,” concluding that “radiographers play a crucial role in guiding patients -- ensuring that more of those who do not require treatment are safely sent home, and more of those who do are appropriately retained for further evaluation.”
Read the analysis on EJR’s website.



















