Residents outperform AI on intracranial hemorrhage detection

Feb 25, 2026

AI software did not perform as well as residents in detecting intracranial hemorrhage (ICH) on unenhanced CT in an emergency room setting, according to a Swiss study. The technology's performance varied for different types of ICH.

The study, published on 21 February in European Radiology and authored by a team from Geneva University Hospitals led by Dr. Quentin Pedrini, sought to assess the performance of diagnostic AI software compared to that of on-call residents in the emergency department of the university hospital.

The researchers analyzed the radiology reports of 2,153 CT scans of adult patients with suspected ICH using a commercial radiological computer-assisted triage and notification AI tool. They then compared the AI’s performance in detecting ICH against that of on-call radiology residents.

In total, 331 (15.4%) of the cases received a diagnosis of ICH from the attending radiologist. In 130 (39.3%), multiple types of ICH were present; subarachnoid hemorrhage was present in 70 (21.1%), subdural hematoma (SDH) was present in 65 (19.6%), intraparenchymal hemorrhage in 58 (17.5%), and epidural hematoma in 8 (2.4%) cases.

The AI tool demonstrated good sensitivity for all types of ICH except for SDH, ranging from 78.6% for subarachnoid hemorrhage to 87.9% for intraparenchymal hemorrhage. Its overall performance was very good, with a sensitivity of 84% and specificity of 94.4%, along with positive predictive value at 73.2% and negative predictive value at 97% (confidence intervals were at 95%).

AI achieved its best sensitivity (97.7%) in cases with the presence of multiple ICH types and where ICH was present at multiple sites. The authors cautioned that “[t]his high diagnostic performance may be attributed to the AI’s intrinsic functioning: The software might label just one of the multiple sites of hemorrhage, and the case automatically becomes positive for ICH, regardless of the ICH location or type.”

The AI’s performance for SDH was notably lower than for other forms of ICH: 50% for detecting subacute (50%) and 25% for chronic SDH.

While the software’s performance was good for everything except SDH, the sensitivity of the residents was significantly better, at 96.4% overall. The residents also outperformed the AI on the other metrics as well, with specificity of 99.6% and positive and negative predictive values of 97.6% and 99.3%, respectively. And, while AI had done notably well with multiple ICH sites, the residents still outperformed the software, with 98.5%.

Additionally, the resident radiologists outperformed the software across different indications for CT. The residents’ performance was consistently high across all subgroups (96.4% for suspected hemorrhage, 97.1% for suspected ischemia, and 93.3% for other suspected intracranial pathologies); AI’s sensitivity was lower for hemorrhage (80.6%) than for ischemia (94.1%) or other intracranial pathologies (93.3%); specificity remained similar.

AI missed ICH in 53 cases; of these, 27 were SDHs, 15 were focal subarachnoid hemorrhage, seven were small parenchymal hemorrhage, three were multiple hemorrhagic types, and one was a 4-mm epidural hematoma.

The residents missed ICH in 12 cases: Three were SDHs, seven were focal cortical subarachnoid hemorrhage, one was a postoperative residual ICH, and one was a 12-mm parenchymal hemorrhage.

Six of the 12 cases of ICH missed by the residents were identified by the AI: Three were focal subarachnoid hemorrhages, one was an acute SDH, one was a parenchymal hemorrhage, and one was a residual postoperative ICH. However, the software flagged 101 negative cases as positive for ICH.

The authors noted a trend in the AI’s missed ICHs: It had more difficulty with smaller cases of hemorrhage and those located along the cerebellar tentorium or falx. CT density also appeared to be a factor, the team wrote, with more lower-density subacute and chronic SDHs missed than the higher-density acute SDHs.

The researchers also remarked on density as a factor in its false-positives. “The common denominator among these cases was the presence of structures with high densities on noncontrast cerebral CTs,” they wrote. While these cases varied, they were mostly attributed to dural thickening, beam hardening artifacts, and calcifications.

While AI did not on the whole perform as well as the resident radiologists, the authors concluded, it did show good performance for some types of ICH, particularly when multiple ICHs were present.

Read the study here.