CARS: Flawed methodology invalidates many breast CAD studies

BERLIN - For decades researchers have been evaluating computer-aided detection (CAD) software to see if it can find more breast cancers. But most of these studies are invalid, charged a U.S. researcher in a presentation at the 2011 Computer Assisted Radiology and Surgery (CARS) congress.

But it's not CAD that's without merit, said Robert Nishikawa, PhD, of the University of Chicago Department of Radiology and Committee on Medical Physics. Well-performed studies show a modest boost in sensitivity and recall rates with use of the software.

CAD should never be used in a bid to increase the cancer detection rate, he said in a presentation during Thursday's mammography CAD sessions. The best users can hope for is finding breast cancers at an earlier stage, and that's a good thing.

"To evaluate screening itself, the problem is that the cancer prevalence is low, which makes the statistics difficult -- you have to screen a large number of women to get good statistics because the prevalence of cancer is five in 1,000 women screened," Nishikawa said.

And although the best screening outcome measure is falling mortality, it's not a practical one because mortality statistics would require a 10- to 20-year wait. So researchers use surrogate end points -- mostly cancer detection and recall rates -- along with positive predictive value (PPV) and the size and stage of cancers, he said.

"If you want to find out if CAD is adding any value, what you want to know is how many cancers radiologists missed -- because those are the ones that CAD can have an effect on -- and that's probably less than one in 1,000," Nishikawa said. As a result, CAD studies have to be at least five times the size of other screening mammography studies to be adequately powered to answer that question.

In 2001, Freer and colleagues examined nearly 13,000 women and found that CAD resulted in about a 20% increase in cancers detected, he said. But in 2004, Gur and colleagues published an even larger study that showed only a 1.7% increase in cancers detected with CAD. This was followed by years of studies showing around a 10% increase, and in the last several years, studies have come up with about a 5% increase in cancers detected.

"I think any reasonable person that looks at these numbers is going to say it's not looking too promising," Nishikawa said. Worse, it is the largest studies that have tended to show the smallest increases in cancer detection with CAD.

But things become clearer when the trials are divided into longitudinal and cross-sectional studies, he said.

  • Longitudinal studies are retrospective and compare cancer detection rates between two time periods -- before CAD was implemented and in the years thereafter.
  • Cross-sectional studies are prospectively performed, comparing cancer detection for each patient without CAD and then with CAD, with the results recorded for each reading.

A look at 11 of the largest breast CAD studies reveals that all the longitudinal studies had relatively small increases in cancer detection -- with a weighted average of less than 2% -- while the cross-sectional studies detected a weighted average of 5.3% more cancers. The biggest studies are the longitudinal kind because they are easier to recruit patients for, Nishikawa said.

So cross-sectional studies are better, but the trouble is that researchers have to match the patients and controls with and without CAD. They must match the radiologists' skills, too, and worry about cancer prevalence, patient age, and the percentage of incident screens in the population being studied, he said.

So why do the largest studies, the longitudinal ones, show the poorest improvement with CAD? It's the methodology, Nishikawa said. "The main disadvantage is that cancer detection rate is not a useful measure for longitudinal studies," he said.

The "why" of the matter takes some explaining. Assume that all cancers grow at the same rate, that radiologists have an 80% sensitivity rate -- a 20% cancer miss rate without CAD -- but only a 10% cancer miss rate with CAD, Nishikawa said. Also assume that any cancer missed one year is detected the next year at screening.

Assume each year that there are 100 new cancers and 20 missed from the previous year for a total of 120. Of this 120, the radiologists detect 100, and on it goes each year until CAD is introduced. That year, say the miss rate is cut by 50% to 10 cancers a year, for a total of 110 cancers detected the first year of CAD.

"But the next year there are only 110 cancers in the population, 100 new cancers plus the 10 that were missed from the previous year," Nishikawa said. When CAD is not used, the radiologist finds 80 cancers plus the 10 missed from the previous year for a total of 90. Using CAD they find 10 more, once again returning total cancers detected to 100 -- just like before CAD.

"So between the reading without CAD period and the reading with CAD period, you have the same number of cancers detected," he said. The total only rises for a year because after the first year "the prevalence drops because there are fewer cancers to be found."

"So CAD will not increase the cancer detection rate -- basically you're just finding them at separate times," Nishikawa said. "The advantage of CAD is that you can find the cancers earlier, which is the whole point of screening."

With this in mind, a more relevant end point than the number of cancers is cancer site and cancer stage, he said. You can even incorporate variable growth rates, leading to more variable cancer detection rates, but ultimately the rate does not change. Nishikawa's take-home message: If you're performing a longitudinal study, look for improvements in cancer size and stage.

A better way

Cross-sectional breast CAD studies, which measure and document the results in each patient with and without CAD, offer the considerable advantage of directly measuring the effect of CAD. However, these studies have important potential downsides, Nishikawa said.

Two possible biases can exist in the data, he said. First, radiologists may read less vigilantly when reading with CAD.

"Radiologists know they're going to get another chance to see what's wrong with this patient," he said. "They're doing this clinically, they're trying to do it quickly, and they do a cursory read of the mammogram, look at the CAD result, and do their actual final reading."

Another possibility is negative bias -- for example, radiologists reading more vigilantly with CAD in clinical trials. "If you're involved in a study you're going to publish, you don't want your colleagues to think you're not a very good reader, so you don't want to miss a lot of cancers that CAD is going to pick up," Nishikawa said.

Unfortunately, it cannot be gleaned from the literature whether both, neither, or one of these biases is present. There is one way to get rid of the positive bias -- a very clever technique used by Gilbert et al, Nishikawa said. In a small fraction of cases in that study, no CAD output was given, and the radiologists knew that in some cases, the final results were up to their own detection abilities. While probably eliminating the positive bias, it could potentially introduce negative bias, but a slight underreporting of CAD's abilities is preferred over the alternative, he said.

Looking only at the cross-sectional CAD studies, the weighted average cancer detection rate increases by 9.3% (p = not significant), and the recall rate increases by 12.4% (statistically significant), at a nearly constant positive predictive value.

In fact, many clinical studies find a statistically significant increase in the recall rate, and no statistically significant increase in cancer detection rate. That does not mean that CAD is not useful, Nishikawa said.

"What it means is that there's not enough cancers in the population to measure a meaningful increase" due to CAD, he said.

Nishikawa used a bootstrapping technique to better express the accuracy of the seven-study sample, which showed that among the seven large cross-sectional studies, the use of CAD:

  • Increased sensitivity by 9.3% (range, 6.3% to 12.9%)
  • Increased recall rate by 12.4% (range, 9.5% to 17.1%)
  • Reduced the positive predictive value by 0.14% (range, -0.28% to -0.05%)

Don't look to meta-analyses for support. The two meta-analyses on breast CAD that Nishikawa is aware of (Taylor et al, Nobel et al) both combined longitudinal with cross-sectional studies.

"In my opinion, that invalidates those studies because they should ignore the longitudinal studies," he said.

The use of CAD might better be compared with independent double reading of mammograms, Nishikawa said. The three double-read studies published (Gilbert et al, Gromet et al, Georgian-Smith et al) saw overall sensitivity boosted from 88% for double reading to 90.4% for double reading plus CAD, and the PPV also rose insignificantly, he said.

What if the only clear change with CAD is an increase in the recall rate? Is that a good thing? It depends on the relative values of true-positive and a false-positive decisions, Nishikawa said. When he assigned utility values for true-positive, true-negative, false-positive, and false-negative decisions to the large Digital Mammographic Imaging Screening Trial (DMIST), the results showed that CAD use was beneficial, he said.

Conclusions

Longitudinal studies should not be used to assess the effectiveness of CAD, and they should be excluded from meta-analyses, Nishikawa said.

Among the seven published sequential studies, sensitivity rates (+9.3%) and recall rates (+12.3%) using CAD are comparable, with only a small decrease in PPV from 5.0% to 4.9%.

CAD can be an effective tool to assist radiologists reading screening mammograms, and its use can increase the utility of screening mammography.

The goal of CAD is to reduce the miss rate of radiologists; however, CAD will not appreciably increase the cancer detection rate.

What's more, CAD should reduce the size and stage of detected cancers, which is a more appropriate end point for its use. Comparing CAD to double reading is also appropriate, he said.

"These studies are flagged to mammography because those are the ones used clinically, but now CAD systems for lung cancer and colon screening are available in the U.S. I'm assuming that clinical studies for those will be happening, and you should learn the lessons from mammography when we go to study those applications," Nishikawa said.

Page 1 of 1243
Next Page