Sponsored by:

Attention turns to value of AI updates in breast screening

The usefulness of AI tools in mammography has often been studied, but changes in results from updated versions have not been examined in depth, and the real benefits of these updates in diagnosis have yet to be determined. A Norwegian-led group has sought to rectify this issue.

In an article published on 13 January in European Radiology, an international team headed by Marthe Larsen and Prof. Solveig Hofvind, PhD, of the Norwegian Institute of Public Health in Oslo sought to examine possible changes to risk scores and potential benefits for diagnosis in an updated version of AI software, comparing older and newer versions.

Left craniocaudal and mediolateral oblique mammograms in a 53-year-old woman with an invasive 4-mm histologic grade 1 screen-detected cancer. The exam-level AI score for v1.7 was 5 and 10 for v2.1 (increased AI score). The red circles mark the location of the tumor.Left craniocaudal and mediolateral oblique mammograms in a 53-year-old woman with an invasive 4-mm histologic grade 1 screen-detected cancer. The exam-level AI score for v1.7 was 5 and 10 for v2.1 (increased AI score). The red circles mark the location of the tumor.Courtesy Marthe Larsen, Prof. Solveig Hofvind et al, European Radiology

The researchers used ScreenPoint Medical’s Transpara tool, versions 1.7 and 2.1, to examine data from 117,709 screening examinations performed as part of BreastScreen Norway, the national program for screening women ages 50 to 69, between 2009 and 2018.

The team evaluated the distributions of exam risk scores (AI score 1–10) and risk categories on all exams for both versions of Transpara. Risk scores and categories correspond as follows: between 1 and 7, low risk; 8 to 9, intermediate risk; and 10, high risk of malignancy.

The data included 737 screen-detected and 200 interval cancers. For the study, screen-detected cancer was defined as invasive breast cancer or ductal carcinoma in situ (DCIS) diagnosed after a recall due to suspicious findings on the original screening mammograms. Interval cancer was defined as invasive cancer or DCIS diagnosed within the 24 months following a negative screening exam, or in the six to 24 months after a false-positive screening result. Only screening exams performed prior to a diagnosis of interval cancer were included in the study, the authors added.

Additionally, the authors examined histopathological tumor characteristics and mammographic features in relation to the scores.

In the analysis, 7.9% of the screen-detected cancers (58/737) showed an increase in AI score from low or intermediate (1-9) in v1.7 to high (10) in v2.1. Scores were unchanged at a low or intermediate level for 5% (37/737); 85.6% (631/737) of the scores were stable for those at the high-risk level (10) AI score for 85.6% (631/737), while 1.5% (11/737) of the cancer cases had a decreased risk score.

Of the 58 screen-detected cases with an increased risk score, 16 originally had an AI score of 1-7 (low risk), and 42 had a score of 9 (intermediate risk) with v1.7. Of these cases, 39.7% (23/58) had discordant interpretations by the radiologists (an interpretation score of 1 by one radiologist and 2 or higher by the other). However, three of the 11 screen-detected cancers with decreased AI risk scores with v2.1 were given an interpretation score of 3 or higher (i.e., intermediate to high suspicion of malignancy) by both radiologists.

For the interval cancers, 11.5% (23/200) had increased scores, 43.5% (87/200) had stable low or intermediate scores, and 33% (66/200) had stable high scores. A total of 12% (24/200) had decreased scores in v2.1.

Overall, the findings show a significant increase in the proportion of screen-detected cancers given the highest risk score of 10 using v2.1 (93.5%) compared with v1.7 (87.1%, respectively); there was no significant difference between the two versions in the proportion of interval cancer cases given a score of 10 (45% for v1.7 and 44.5% for v2.1).

In analyzing the changes in scores with the newer version, the authors remarked that the histopathological and mammographic characteristics do not explain why more screen-detected cancers were assigned a score of 10 with v2.1.

According to findings from previous studies, approximately 20% to 30% of interval cancers receive a false negative result, meaning they could have been detected earlier, the authors wrote. However, with no increase in the number of interval cancers assigned the high-risk score -- and 12% having lower scores --the authors have suggested that “there might be a limit in terms of AI-detectable visible findings, with a maximum potential for identifying interval cancers in the range of 33% (stable high) to 44.5% (percentage with AI score 10 with version 2.1).”

The authors did not identify any trend of favorable or nonfavorable tumor characteristics for interval cancers in the AI scoring, although they posited that with a high proportion of histological grade 3 tumors and triple-negative cancers, cases in which the low- or intermediate-risk scores remained stable with v2.1 could reflect true interval cancers, as there would be a lack of suspicious findings on screening images.

Thus, while a higher proportion of screen-detected cancers were categorized with the highest AI score with the updated software, they noted no change with the interval cancers and stressed that “the net benefit of using AI in clinical practice remains unknown.”

Read the study on European Radiology’s website.

Page 1 of 175
Next Page