AI imaging bias reduces diagnostic accuracy

Jan 3, 2024

Systematically biased artificial intelligence (AI) imaging models lower diagnostic accuracy by over 11 percentage points, according to research published on 19 December in JAMA.

A team led by Sarah Jabbour from the University of Michigan in Ann Arbor also found that biased AI model predictions with explanations lowered accuracy by about nine percentage points. However, accuracy improved by over four percentage points when clinicians reviewed a patient clinical vignette with standard AI model predictions and model explanations compared with baseline measures.

“Given the unprecedented pace of AI development, it is essential to carefully test AI integration into clinical workflows,” the Jabbour team wrote.

While AI continues to show its potential in aiding radiologists and other clinicians to diagnose patients, systematic bias persists as a barrier to the technology’s widespread use by reducing diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to help lessen errors made by models. However, the researchers noted a lack of data showing the effectiveness of this strategy.

Jabbour and co-authors studied the impact of systematically biased AI on clinician diagnostic accuracy, as well as whether image-based AI model explanations could decrease model errors. The multicenter study included hospitals in 13 U.S. states, using a survey administered between April 2022 and January 2023.

The team showed 572 participating clinicians nine clinical vignettes of hospitalized patients with acute respiratory failure. These included symptoms, physical examinations, laboratory results, and chest radiographs. From there, the clinicians determined the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause in each case.

The researchers showed the clinicians two vignettes without aid from the AI model to establish baseline accuracy. The clinicians were then randomized to view six vignettes with or without AI model assistance. The clinicians had a median age of 34 years and 57.7% were female. Additionally, 31.6% of the total participants reported having previously interacted with clinical decision support tools, while 66.7% were not aware that AI could be systematically biased based on patient demographics.

The clinicians achieved a baseline accuracy of 73% for the three diagnoses. However, their accuracy improved to 75.9% and 77.4% when shown AI model explanations.

Systematically biased AI model predictions meanwhile decreased clinician accuracy by to 61.7% compared with baseline. Providing biased AI model predictions with explanations decreased clinician accuracy to 63.9% compared with baseline. The researchers reported that the improvements by the model explanation on biased AI did not achieve statistical significance.

The study authors wrote that while the results suggest that clinicians may not be able to serve as a failsafe against flawed AI, they can play a key role in understanding AI’s limitations.

In an accompanying editorial, authors led by Rohan Khera, MD, from Yale University wrote that the Jabbour team’s results are concerning, adding that errors caused by automation bias are “likely to be further compounded by the usual time pressures faced by many clinicians.” They also wrote that the results highlight the clinical challenge of clinicians relying on assistive technologies.

“If a model performs well for certain patients or in certain care scenarios, such automation bias may result in patient benefit in those settings,” the authors wrote. “However, in other settings where the model is inaccurate—either systematically biased or due to imperfect performance—patients may be harmed as clinicians defer to the AI model over their own judgment.”

The study can be found in its entirety here.