The Accuracy Limits Of Data-Driven Healthcare


Forbes
David Talby
Feb 16, 2022
getty


Edited by


Joaquim Cardoso MSc.
Health Management  . Institute
Transformation Institute for better health, care, cost and access for all.
Data Driven Healthcare Unit
June 21, 2022


What is the problem?


  • Recent research from MIT found a high number of errors in publicly available datasets that are widely used for training models.
  • An average of 3.3% errors were found in the test sets of 10 of the most widely used computer vision, natural language processing (NLP) and audio datasets.

What are the causes?


One of the reasons for this is the data source.

  • More than half of the clinically relevant data for applications like recommending a course of treatment, finding actionable genomic biomarkers or matching patients to clinical trials is only found in free-text.

Another barrier exists in the limitations of what’s in the data itself.

  • Because there are no shared standards for data collection across hospitals and healthcare systems, inconsistencies and inaccuracies are common.

It’s not just providers to blame, either — inaccuracies come directly from the patients themselves.

  • A recent study from The Journal of General Internal Medicine shows just how prevalent this can be.
  • When exploring the accuracy of race, ethnicity and language preference in EHRs, the study found that 30% of whites self-reported identification with at least one other racial or ethnic group, as did 37% of Hispanics and 41% of African Americans.

There are also data quality issues outside our direct control, such as fraud and abuse.

  • Estimates are that “healthcare fraud costs the nation from 3% to 10% of annual healthcare expenditure.

What are the improvements in recent years?


  • Technology that can automatically understand the nuances of unstructured text and images, as well as reconcile conflicting and missing data points, is gradually maturing.
  • NLP, for example, can address many pitfalls of data quality, such as uncovering disparities in an EHR versus a doctor’s transcript or what a patient self-reports.
  • In recent years, newer algorithms and models can apply the context, medium and intent of each data source to infer useful semantic answers.
  • Current state-of-the-art, peer-reviewed, publicly reproducible accuracy benchmarks on both competitive academic benchmarks and real-world production deployments has been steadily improving over the last five years.
  • Libraries like Spark NLP surpass 90% accuracy on a variety of clinical and biomedical text understanding tasks.

There’s clearly a need for better data collection practices in healthcare and beyond.

The healthcare industry is varied and complex and so, too, is the information collected.

  • When using data to make any decision in this field, technology that helps will keep improving.
  • But it’s critical to remember the fundamental limitations of data quality and accuracy that power these algorithms.

Simply put, it’s not safe to assume that a piece of data is correct because someone typed it into a computer.





ORIGINAL PUBLICATION (full version)


Algorithms are only as good as the quality of data they’re being fed. 


This is not a new concept, but as we begin to rely more heavily on data-driven technologies, such as artificial intelligence (AI) and other automation tools and applications, it’s becoming a more important one.


Recent research from MIT found a high number of errors in publicly available datasets that are widely used for training models. 


An average of 3.3% errors were found in the test sets of 10 of the most widely used computer vision, natural language processing (NLP) and audio datasets.


An average of 3.3% errors were found in the test sets of 10 of the most widely used computer vision, natural language processing (NLP) and audio datasets.


Given that accuracy baselines are often at or above 90%, this means that a lot of research innovation amounts to chance — or overfitting to errors. 

Data science practitioners should exercise caution when choosing which models to deploy based on small accuracy gains on such datasets.


These findings are particularly concerning when it comes to AI applications in high-stakes industries like healthcare. 


Outcomes in this field have the ability to prevent disease, accelerate the development of life-saving medicine and help us understand the spread of disease and other critical health trends. 

While accuracy in healthcare is vital to success, it’s also rife with complexities that make this extremely challenging.


While accuracy in healthcare is vital to success, it’s also rife with complexities that make this extremely challenging.


One of the reasons for this is the data source. 


More than half of the clinically relevant data for applications like recommending a course of treatment, finding actionable genomic biomarkers or matching patients to clinical trials is only found in free-text. 

This includes physicians notes, diagnostic imaging, pathology reports, lab reports and other sources not available as structured data within electronic health records (EHR). 

These information sources include nuances and data quality issues that make it hard to connect the dots and get a full picture of a patient.


Another barrier exists in the limitations of what’s in the data itself. 


Because there are no shared standards for data collection across hospitals and healthcare systems, inconsistencies and inaccuracies are common. 

Between different organizations collecting different information and records not being updated on a consistent basis, it’s difficult to know how accurate the data is — especially if it’s being moved and updated among different providers.


It’s not just providers to blame, either — inaccuracies come directly from the patients themselves. 


A recent study from The Journal of General Internal Medicine shows just how prevalent this can be. 

When exploring the accuracy of race, ethnicity and language preference in EHRs, the study found that 30% of whites self-reported identification with at least one other racial or ethnic group, as did 37% of Hispanics and 41% of African Americans. 

Patients were also less likely to complete the survey in Spanish than the language preference noted in the EHR would have suggested.

in a recent study … 30% of whites self-reported identification with at least one other racial or ethnic group, as did 37% of Hispanics and 41% of African Americans.


There’s clearly a need for better data collection practices in healthcare and beyond. 


Accurate information can help the medical community understand more about social determinants of health, patient risk prediction, clinical trial matching and more. 

Standardizing how this data is collected and recorded can ensure the clean data gets shared and analyzed correctly. 

This is both a medical and social challenge. For example, what is the “correct” race to fill in? When exactly is someone considered a smoker? This is also partly a technology challenge, as we’re already way beyond the limit of what’s reasonable to ask providers and patients to manually input.


Accurate information can help the medical community understand more about social determinants of health, patient risk prediction, clinical trial matching and more.


Standardizing how this data is collected and recorded can ensure the clean data gets shared and analyzed correctly.


There are also data quality issues outside our direct control, such as fraud and abuse. 


The National Health Care Anti-Fraud Association estimates that “healthcare fraud costs the nation about $68 billion annually — about 3% of the nation’s $2.26 trillion in healthcare spending. 

Other estimates range as high as 10% of annual healthcare expenditure, or $230 billion.” 

While we can account for error rates within the data, it’s an imperfect science at the end of the day, and it’s important to understand its limitations.


The National Health Care Anti-Fraud Association estimates that “healthcare fraud costs the nation about $68 billion annually — about 3% of the nation’s $2.26 trillion in healthcare spending. Other estimates range as high as 10% of annual healthcare expenditure, or $230 billion.”


While we can account for error rates within the data, it’s an imperfect science at the end of the day, and it’s important to understand its limitations.


That said, it’s not all doom and gloom when it comes to quality data or the algorithms we use. 


Technology that can automatically understand the nuances of unstructured text and images, as well as reconcile conflicting and missing data points, is gradually maturing. 


NLP, for example, can address many pitfalls of data quality, such as uncovering disparities in an EHR versus a doctor’s transcript or what a patient self-reports. 

In recent years, newer algorithms and models can apply the context, medium and intent of each data source to infer useful semantic answers.


This is especially useful when you consider how specific clinical language is. 


Take how we indicate triple-negative breast cancer (TNBC), for instance. 

While the acronym TNCB isn’t hard to identify, the condition can also be denoted as Er-/pr-/h2-, (er pr her2) negative, tested negative for the following: er, pr, h2 and triple negative neoplasm of the upper left breast, to name a few. 

NLP can identify variations of these terms when they are in context — and healthcare-specific deep learning models have gotten very good at this.


Current state-of-the-art, peer-reviewed, publicly reproducible accuracy benchmarks on both competitive academic benchmarks and real-world production deployments has been steadily improving over the last five years. 


Libraries like Spark NLP surpass 90% accuracy on a variety of clinical and biomedical text understanding tasks. 

Reproducibility of results, consistency of applying clinical guidelines at scale and the ability to easily tune models to a specific clinical use case or setting are three keys to successful implementations and to building broader trust in AI technology.


The healthcare industry is varied and complex and so, too, is the information collected. 


When using data to make any decision in this field, technology that helps will keep improving. 

But it’s critical to remember the fundamental limitations of data quality and accuracy that power these algorithms.

Simply put, it’s not safe to assume that a piece of data is correct because someone typed it into a computer.


…it’s not safe to assume that a piece of data is correct because someone typed it into a computer.



About the author


David Talby

PhD, MBA, CTO at John Snow Labs. Making AI & NLP solve real-world problems in healthcare, life science and related fields.


Originally published at https://www.forbes.com.

Total
0
Shares
Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Related Posts

Subscribe

PortugueseSpanishEnglish
Total
0
Share