Austrians spell out how to succeed with machine learning

Jan 6, 2019

It's vital for radiologists to understand the basic principles of machine learning and the differences between supervised and unsupervised learning, as well as to appreciate the importance of data quality, according to award-winning research from Austria.

In an RSNA 2018 e-poster that received a certificate of merit from the judges, Dr. Sebastian Roehrich and colleagues at the Computational Imaging Research Lab in the department of biomedical imaging and image-guided therapy at the Medical University of Vienna presented the following checklist for radiologists:

Is a research question clinically relevant?
What are possible clinical use cases for a machine-learning solution?
What is the quality and availability of the data?
What is the validity and reliability of the data?
Is there some inherent bias to the data?
Does the clinical workflow allow collection of the data in the intended way?
What is the clinical gold standard to which an algorithm needs to be compared with?
How can you correctly interpret the results in a clinical context?

By ignoring these points, a machine-learning project may fail, even if the methodological approach is sound, they pointed out.

Supervised versus unsupervised learning

For both supervised and unsupervised learning, it is important for the automated analysis conducted by the data scientist to determine whether the data are structured and whether certain parameters introduce variability to the image. For example, the fatty tissue of adipose patients leads to an increased attenuation, and thus noise over the whole CT scanning volume. This will influence imaging features extracted by machine learning, noted the authors, who added that other sources of variability are acquisition parameters, reconstruction kernel, slice thickness, and movement artifacts.

In machine learning, the input data may comprise different clinical parameters, histology images or results, lab findings, radiological images, or basically any medical data, but the outputs are different for supervised and unsupervised learning. For supervised learning, the output may be data (or a label) that a computer can predict, such as a diagnosis. For unsupervised learning, there are no labels, and the computer finds meaningful results on its own.

"What is supervised learning?" Roehrich and colleagues asked. "We present both input and output to the computer. The goal is to learn a predictive model from input to output (= label). Therefore, the supervised machine-learning algorithm does not find a new output. Neither can it confirm nor reject a hypothesis. It learns the way to the given label."

To get a reliable result, both the input data and labels must be known beforehand, and after training, a successful model will be able to predict the label in new data that was not part of the training data, they explained. However, if data quality is low, an accurate prediction might not be possible.

Labels for training a supervised machine learning model need to be relevant, valid, and accessible, the authors continued. Labels are also known as "ground truth," and examples include diagnosis made by a gold standard, biopsy results, and annotations made by radiologists (i.e., manually highlighted features in medical images).

One objective may be automatic segmentation of pneumothorax, for which the input is chest CTs, with and without pneumothorax. The label is the pneumothorax, annotated by a radiologist.

"Annotations made by radiologists are important for getting labels for machine learning," they wrote. "However, this can be quite tedious, and interrater variability should be kept in mind."

Problems with labels

In supervised learning, the model learns how to predict the given label, and this can be both an advantage and a disadvantage, according to the researchers. Imprecise labeling (= label-noise) decreases the prediction accuracy. Some labels are not very reliable if findings are ambiguous, and other labels are not structured and are less suitable for an automated analysis, such as radiology reports, but a problem with labeling can be overcome with unsupervised learning, they noted.

In unsupervised learning, the user only presents input to the computer. The goal is to find similarity and structure in the dataset, and such similarities (or clusters) can lead to novel hypotheses for further investigation.

"The goal is to find similarity and structure in the dataset -- even structure that is not immediately visible to the human eye," the authors wrote. "Such an approach may allow us to data mine extremely large amounts of images currently stored in a PACS."

Supervised learning is useful for predictions, but it needs valid labels and can be time-consuming, and precise labeling leads to "label noise" due to variance, they stated. On the other hand, unsupervised learning can be efficient for finding structure and clustering, no labeling is necessary, and it may need a larger sample size. Also, it needs more modeling and weighting of variables (i.e., grade data according to its relevance), which requires medical experience, and interpretation can be tricky.

"With unsupervised learning, novel imaging features, which are not accessible for human vision, can be found," they wrote. "In supervised learning, an expert would need to annotate lung regions to create a label for the machine learning algorithm. In unsupervised learning, the machine-learning algorithm identifies recurring patterns in the lung by itself."

By including technical parameters, a machine-learning model might be able to accurately predict bone density from vertebral bodies in routine CT scans without a phantom, which currently would be necessary, Roehrich and colleagues added. The supervised approach would involve the registration and segmentation of vertebral bodies and building a model for supervised prediction of bone density. Inputs would consist of the average density of vertebrae and acquisition and reconstruction parameters, while the output is a label -- i.e., bone density as measured by bone densitometry.

In cases of acute respiratory distress syndrome (ARDS), the aim is to predict the syndrome in patients with traumatic lung injury from baseline chest CT, the researchers wrote. The supervised approach involves registration and segmentation of the lung, annotation of radiologically visible lung injury, and supervised classification such as "ARDS in development yes/no." The input consists of manually annotated parameters (e.g., volume of injured lung), while the output is the label of clinically verified ARDS.

An alternative to manual annotation of features of lung injury would be to use unsupervised learning to automatically extract such imaging features, they concluded.