One possible data-driven approach for quantifying the risk of overdiagnosis is to leverage large databases of routinely collected clinical data …
… and analyze patient trajectories to better understand the effects of being classified as having a disease.
JAMA Network
Daniel Capurro, MD, PhD1; Simon Coghlan, PhD2; Douglas E. V. Pires, PhD1
January 21, 2022
oncologycentral
The accelerated adoption of digital technologies in people’s lives is creating unique opportunities to leverage routinely collected digital data and machine learning models to diagnose diseases before they become symptomatic.
Like traditional tests, digital screening will likely generate cases of overdiagnosis and thereby harm some patients.
Digital screening tests (such as detecting mood or sleep disorders from smartphone use patterns) are being developed faster than the ability to assess their value.
The additional risks and benefits of these digital tests are not only a function of their accuracy; and it is important that such approaches be validated prospectively.
Digital screening tests (such as detecting mood or sleep disorders from smartphone use patterns) are being developed faster than the ability to assess their value.
This Viewpoint proposes using large longitudinal databases of patient information that describe clinical trajectories and outcomes to preemptively quantify the risk of overdiagnosis posed by new digital screening technologies.
Distinguishing patients who might not benefit from an early diagnosis might allow for better training of machine learning models to more accurately detect clinically meaningful diseases in the general population and help minimize causing harm to patients.
This Viewpoint proposes using large longitudinal databases of patient information that describe clinical trajectories and outcomes to preemptively quantify the risk of overdiagnosis posed by new digital screening technologies.
Traditional screening involves the administration of clinical tests (eg, surveys, laboratory tests, imaging) to detect diseases or risk factors in individuals who have not presented any symptoms of the disease.
Digital screening involves the application of data analysis techniques to routinely collected data by individuals, with the goal of detecting abnormal patterns that might be compatible with the disease of interest or else might predict future disease.
Examples include early detection of Parkinson disease by using information from an individual’s typing patterns on a keyboard1 or detection of atrial fibrillation in the general population by using smartwatches.2
Digital screening involves the application of data analysis techniques to routinely collected data by individuals, with the goal of detecting abnormal patterns that might be compatible with the disease of interest or else might predict future disease.
There is broad consensus that finding and treating some diseases early could significantly improve patient outcomes and reduce health care costs.
Translating this goal into practice, however, has proven more difficult than originally thought.
For more than 5 decades, health care systems and government agencies have advocated screening methods to identify individuals with undiagnosed conditions that might benefit from early treatment.
This approach has resulted in widespread implementation of screening programs for early detection of cancer, cardiovascular diseases, and metabolic diseases, such as, for example, screening approaches based on recommendations from the US Preventive Services Task Force.
There is broad consensus that finding and treating some diseases early could significantly improve patient outcomes and reduce health care costs.
Translating this goal into practice, however, has proven more difficult than originally thought.
The past 20 years!
However, during the past 20 years it has become apparent that the expected positive outcomes anticipated from some screening programs have not materialized.
Some screening programs have significantly increased the incidence of specific conditions without improving the overall health of the population.
Notable examples include
- thyroid cancer screening, which in some countries was associated with a 2-fold increase in the incidence of this cancer without changes in mortality,3 and screening for
- attention-deficit/hyperactivity disorder, which, as multiple studies have demonstrated, has made it more likely for the youngest children in a class to be diagnosed with the condition.4
Some screening programs have significantly increased the incidence of specific conditions without improving the overall health of the population.
Notable examples include are: (1) thyroid cancer screening, and (2) attention-deficit/hyperactivity disorder
These examples illustrate a problem of overdiagnosis.
Although the precise definition of overdiagnosis is philosophically disputed, it can be defined as situations in which individuals meet accepted diagnostic criteria for a specific disease but for which, had the condition remained undetected, the individual would not have experienced reduced life expectancy or quality of life.5
Overdiagnosis is not a false-positive diagnosis (diagnosing a disease in an individual who does not meet diagnostic criteria) or a misdiagnosis (diagnosing the wrong condition in an individual who does have an underlying disease).
Rather, it is a diagnosis that is correct according to prevailing criteria but has no or limited medical benefit for the patient and may even result in harm.
Overdiagnosis is a diagnosis that is correct according to prevailing criteria but has no or limited medical benefit for the patient and may even result in harm.
Although the results are complex to quantify, 1 report has suggested that an estimated 18% to 24% of all cancers in Australia may represent cases of overdiagnosis.6
This can have negative consequences for individuals and health care systems, including unnecessary testing, harmful treatments, and the detrimental psychological effects of people being labeled as having a disease.
Avoiding overdiagnosis whenever possible is an ethical imperative.
Medical and ethical concerns about digital overdiagnosis are all the more pressing, given the general lack of understanding of the nature and consequences of overdiagnosis among some clinicians and among the public.
… an estimated 18% to 24% of all cancers in Australia may represent cases of overdiagnosis
Medical and ethical concerns about digital overdiagnosis are all the more pressing, given the general lack of understanding of the nature and consequences of overdiagnosis among some clinicians and among the public.
- Overdiagnosis has been observed for many cancers such as — prostate cancer and breast cancer and
- other conditions such as — chronic kidney disease, gestational diabetes, high blood pressure, and autism spectrum disorder.
A frequent cause of overdiagnosis involves selecting diagnostic criteria defined in a select group of individuals and applying those criteria to healthier sections of the population.
The widespread adoption of digital technologies by the general population (eg, smartphones, smartwatches, social networks) and by health care organizations (eg, electronic medical records, remote monitoring devices) is generating vast amounts of information about behaviors of individuals and their interactions with health care systems.
These large data sets, coupled with progress in artificial intelligence and machine learning algorithms that analyze them, provide significant opportunities to detect diseases at earlier stages and thereby potentially improve population health.
However, the ease with which these new diagnostic algorithms can be developed and applied to readily obtainable data sets could precipitate a substantial number of overdiagnosed conditions without preventive action.
Preventing overdiagnosis should partly involve quantifying its likelihood; yet efforts in quantifying the risks of new digital diagnostic algorithms’ contribution to overdiagnosis are limited.
However, the ease with which these new diagnostic algorithms can be developed and applied to readily obtainable data sets could precipitate a substantial number of overdiagnosed conditions without preventive action.
Preventing overdiagnosis should partly involve quantifying its likelihood; yet efforts in quantifying the risks of new digital diagnostic algorithms’ contribution to overdiagnosis are limited.
Randomized clinical trials (RCTs)
Randomized clinical trials (RCTs) are the ideal way to assess digital diagnostic tools beyond traditional performance metrics such as sensitivity, specificity, and likelihood ratios.
They allow quantification of the clinical benefits and risks that emerge from implementing a new diagnostic tool.
However, such trials are time consuming and costly, and also tend to identify overdiagnosis and its potential harms to the population and costs to health care systems only retrospectively, after the screening intervention has been deployed.
For example, the Food and Drug Administration approved the prostate-specific antigen (PSA) test to screen for prostate cancer7 in 1987; the results of the first 2 RCTs were published in 2009. It took more than 20 years and the randomization of more than 250 000 patients to obtain a clear understanding of the risk of overdiagnosis and overtreatment for prostate cancer caused by such screening.8,9
Today it is estimated that between 2.3% and 15.4% of patients diagnosed via PSA may be overdiagnosed.10
Digital screening algorithms are being developed at a speed that greatly exceeds the ability to conduct RCTs to measure their effectiveness or to adjust disease definitions to healthier sections of the population.
Randomized clinical trials (RCTs) are the ideal way to assess digital diagnostic tools beyond traditional performance metrics such as sensitivity, specificity, and likelihood ratios.
However, such trials are time consuming and costly, and also tend to identify overdiagnosis and its potential harms to the population and costs to health care systems only retrospectively, after the screening intervention has been deployed.
The same digital health and data-driven revolution that enables using patient data to improve diagnosis could assist detection and prevention of overdiagnosis at early development stages, potentially allowing researchers to better weigh the risks and benefits of new digital diagnostic algorithms to guide their safe deployment and use.
One possible data-driven approach for quantifying the risk of overdiagnosis is to leverage large databases of routinely collected clinical data and analyze patient trajectories to better understand the effects of being classified as having a disease.
Electronic medical records collect a variety of clinical information such as diagnoses, laboratory test results, vital signs, prescriptions, and procedures.
These elements involve a temporal dimension and can be organized into disease trajectories.
Through analysis of disease trajectories, it may be possible to better understand which subgroups of patients might not benefit from being labeled “diseased.”
This data-driven approach could be used to improve disease definitions considering patient trajectories and aid in the identification of clinical attributes that might allow clinicians to more accurately distinguish a diagnosis from an overdiagnosis.
One possible data-driven approach for quantifying the risk of overdiagnosis is to leverage large databases of routinely collected clinical data and analyze patient trajectories.
For example, the approach could involve improving algorithms that screen for sepsis in the general hospital population.
Sepsis is a condition that will normally complete its course during a single hospital admission.
Patient trajectories from large hospital databases, complemented by standard diagnostic criteria as a method to define the subgroup who will benefit from being diagnosed, could be used as an improved criterion standard or label to train a new machine learning algorithm, thus allowing the algorithm to identify patients who both meet the sepsis diagnostic criteria and present with compatible disease trajectories.
References & additional information
See the original publication
About the authors & affiliations
Daniel Capurro, MD, PhD1; Simon Coghlan, PhD2; Douglas E. V. Pires, PhD1
1Centre for the Digital Transformation of Health, School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
2 Centre for AI and Digital Ethics, School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia