Data standards and standardization: The shortest plank of bucket for the COVID-19 containment


The Lancet Regional Health — Western Pacific
Mengchun Gong,a,b* Yuanshi Jiao,c Yang Gong,d and Li Liua
August 11, 2022


Executive Summary edited by:


Joaquim Cardoso MSc.
The Health Institute 
— for health systems transformation (HST)
Data Driven Health Care Unit
August 16, 2022


  • Healthcare IT, data sciences, and AI have failed public expectations during the COVID-19 pandemic due to the inadequate preparedness of IT infrastructure in most countries, if not all. 
  • Lack of data standards and low-to-middle level of data standardization were part of the major causes and the shortest plank in the bucket for the containment of the pandemic. 
  • With strong coordination by WHO, a global effort to increase interoperability among the healthcare IT systems of different countries will be a fundamental step to get prepared for the next pandemic with an unknown origin.



ORIGINAL PUBLICATION (full version)


Introduction


In the battle against the unprecedented pandemic of COVID-19 worldwide, biomedical informatics, especially data standards and data standardization, have played significant roles in multiple aspects in containment of the pandemic, including understanding disease mechanisms,[1] improving clinical care,[2] triaging resource needs,[3] advising policy-making,[4] implementing public health countermeasures,[5] enhancing technical innovation in syndromic surveillance,[6] developing vaccines, and enabling wide coverage of vaccination.[7]


Nevertheless, the development of the standards for COVID-19 relevant data collection during the pandemic have gone through a lot of obstacles[8] globally since the very beginning of the pandemic, which led to misleading statistics, inefficient communication, biased policy-making, and clinical risks.[9]


COVID-19 provided an eminent chance to test the data infrastructure in different regions and many issues and challenges have been exposed.


Efforts to access and align existing healthcare data infrastructure in the context of the pandemic highlighted complicated interoperability challenges, which remain significant barriers to real-time data analytics and hurdles for improving health outcomes through data-driven responses.[10]


By reflecting on the COVID-19 related data standards in runological order (Figure 1), recommendations are made with the goal of promoting a globally-aligned standardization of healthcare data and the establishment of a community of common health for humankind amid the current and potentially future global public health crisis.


Figure 1. Timeline of data standards development during initial phase of COVID-19.



Recognizing the value of data standards and standardization for COVID-19 containment


It is now an era when medical practices, in both routine and emergent scenarios, are continuously recorded by digital systems, covering electronic health records and physiologic, laboratory, imaging data as well as decision-making and treatment information.


Therefore, when no clinical trial data informs a rapidly evolving situation or unknown disease, the expectation would arise from the public for rapid and large-scale data collection, analysis to support strategic decision-making, and sharing of best practices.[11]


A critical component of the proposed strategy is the democratization of data: all collected information (observing necessary privacy standards) should be made publicly available immediately upon release in machine-readable formats based on open data standards and enabling data-informed decision making for all stakeholders.


Data standards empower international knowledge discovery and solution exploitation


Understanding of the clinical characteristics and responses to treatment of COVID-19 brought enormous value to clinicians when the trial-based evidence was sparse.[12] [13]


The large-scale real-world evidence generation network formed within the framework of OHDSI (Observational Health Data Sciences and Informatics)[14] has brought an innovative approach to coordinate data sources from different institutes, countries, and languages, aligned a cohort of over 4.5 million cases, and retrospectively described the unknown disease with strong representativeness on populations and regions (Europe, United States, South Korea, and China).


OHDSI developed a comprehensive vocabulary system to incorporate data standards used in different countries and areas and implemented them in data processing and analytics.


The high-level standardization and implementation of multiple standards enabled the OHDSI network to bring insights to clinical characteristics,[15] treatment pathways[16] and subgroup patients analysis.[17]


The network also provided important evidence on potential repurposed medications, which demonstrated an important approach to scan existing therapeutic 2 methods in the lack of clinical trials of a new regimen.[18]


Last but not the least, data standardization and data sharing significantly improved the recruitment efficiency of clinical trials for new treatments and effectively monitored potential side effects of various medicinal products and the vaccines.[19]


The sharing of the data has been restricted to comply with related regulations. The potential of data-driven knowledge discovery and transfer has been weakened accordingly.


However, in face of the high pressure, the scientific world has been robust in encouraging novel studies and data sharing without violation of data privacy.


It’s important to point out that the data standards and their implementation in different countries and languages have enabled multi-national studies without inflicting concerns of data governance and original data leakage.


Within the coordinating mechanisms organized by OHDSI,[20] TriNetX,[21] ICODA,[22] and other open-science networks, insights can be extracted, with an unprecedented scale and efficiency, from multiple independent databases around the world due to their common data model, vocabulary control, quality control, privacy protection mechanism and ethics standards.



Data standards enable data-informed decision making


Statistical analysis of the epidemiological trend required a standard nomenclature for the disease and high quality of data standardization in case reporting as well as data collection at both regional and global level.[23]


Inference from the epidemiological data to calculate the population size of potential contact was one of the key parameters to make policies on public health.


It is difficult to assess the accuracy of the data at the population level when the relevant data are distributed in the silos and the data owners are not willing to share it.


Our experience, as illustrated in the Honghu Hybrid System (HHS),[24] was using digital technologies to connect variable, if not all, data sources, integrated and standardized the data, and generated a near real-time surveillance system (daily) in the area with a population close to a million.


Error in statistics during the emergent period of the pandemic was inevitable. A double-check mechanism, enabled by an independent channel (digital vs. manual) effectively minimized mismatched information.


Moreover, to mitigate the huge burden on medical needs and manpower shortage, many clinical decision-support systems (CDSS), mostly machine-learning based and data-driven, were developed and implemented in different checkpoints of the data flow[25] for covering syndromic surveillance, triaging, severity classification, and outcome prediction.


Although successes were reported within individual development sites, these systems could hardly be transplanted to other sites.


The major reasons for such challenge include inconsistency in data standards and standardization, lack of usability for laypersons, difficulty of deployment in resource-poor settings, and potential ethical pitfalls (www.thelancet.com Vol 00 Month, 2022 or legal barriers).[26]


The systems with the highest success rate of migration were the classification of chest CT images based on artificial intelligence (AI) technologies[27] since the data in the Picture Archiving and Communication System (PACS) around the world follow the Digital Imaging and Communication in Medicine (DICOM) standard.


However, the power of AI and data-driven predictive science played little role in improving the general level of clinical care for the COVID-19 patients, especially for the severe cases as the data infrastructure of standards and standardization were not ready for such challenges.


The systems with the highest success rate of migration were the classification of chest CT images based on artificial intelligence (AI) technologies[28] since the data in the Picture Archiving and Communication System (PACS) around the world follow the Digital Imaging and Communication in Medicine (DICOM) standard.



Reflection and effort on improving the level of data standardization


It is never too late to mend the fences as an old Chinese proverb said. There is an urgent need to reflect on the cause of low effectiveness of data sharing, data mining, and data science applications during the COVID-19 pandemic.


The most important factor, also the shortest plank of bucket for the effort of containing the pandemic, is the lack of a widely implemented clinical data standard system and the various level of data standardization.


The most important factor, also the shortest plank of bucket for the effort of containing the pandemic, is the lack of a widely implemented clinical data standard system and the various level of data standardization.


This made the value of all the investment on hardware and software diminish.


In order to quickly form an international data sharing network to generate real-world evidence and understand the disease as well as the affected populations,[29] it is important to implement standards beyond the classification code (ICD).


SNOMEDCT (Systematized Nomenclature of Medicine − Clinical Terms), LOINC (Logical Observation Identifiers Names and Codes), and RxNorm are among the top recommended terminology systems.[30]


In order to quickly form an international data sharing network to generate real-world evidence and understand the disease as well as the affected populations,[31] it is important to implement standards beyond the classification code (ICD).


SNOMEDCT (Systematized Nomenclature of Medicine − Clinical Terms), LOINC (Logical Observation Identifiers Names and Codes), and RxNorm are among the top recommended terminology systems.


In November 2020, the European Commission declared its commitment to the establishment of the European Health Data Space (EHDS), with the goal of facilitating access and better utilization of the European health data — eg, EHR, genomic, public health, and registry data.[32]


Meanwhile, the Europe Commission announced the financial support program to member countries on implementing SNOMED CT as their core clinical vocabulary standard to enhance interoperability and increase the value of the data.[33]


This provided a good example for the Western Pacific countries and regions to learn and build a data sharing platform for the future by clearly defining the best practices for fair benefit sharing, transparent and accountable governance of public and private sector data, true commitment to public dialogue, and global cooperation.



Recommendations for a tested preparedness


Strengthen the leadership of WHO


Reflecting on the initial phase of the COVID-19 pandemic, the identification of the pathogenic microorganism and its nomenclature, the characterization of the clinical manifestation and the definition of the diseases (from novel coronavirus pneumonia to COVID-19) have been the key steps for global coordination on research resources and implementation of public health countermeasures.[34]


WHO played an essential role in coordinating the expert resources, government support, and world-wide implementation, which paved the foundation for disease classification in healthcare IT systems, epidemiological statistics, and multi-center research programs. ICD has been proven efficient and cost-effective, considering the implementation in multiple languages in a short time across countries.


International collaboration, under the leadership of WHO, should be strengthened to get more prepared for the future global public health emergencies.


The upcoming ICD-11,[35] which has been significantly modified to cope with the increasing needs in classification with more granularity, hierarchical terminology structure, coverage on clinical phenotypes, and incorporation of traditional medicine, will definitely help improve preparedness of data infrastructure in different countries.


Avoid potential bias and conflicts


Bias has been observed in the process of naming the disease.


The use of the name of Wuhan city, where the world started to know about the virus, by some politicians and experts raised widespread sentimental conflicts worldwide and caused unnecessary waste of time and resources in that special period when each hour was counted for battling the disease, including taking care of patients and conducting research on understand the disease.


We recommend that the bias and conflicts should be avoided, following the current naming methodology for COVID-19, to improve the implementation of the standards in all relevant countries and areas.


Equity in technology access and international collaboration


It is also recognized an unmet need to help low-to-middle income countries to accomplish standardization of the data and application of healthcare IT technologies.


A regional effort to control the disease with such high transmissibility will not be successful without the involvement of all countries and regions.


Training, financial support on infrastructure, free implementation of mature systems, and man-power support in data standardization and analytics are necessary and essential,5 especially for low-to-middle income countries and areas.[36]



Conclusion


Healthcare IT, data sciences, and AI have failed public expectations during the COVID-19 pandemic due to the 4 inadequate preparedness of IT infrastructure in most countries, if not all. Lack of data standards and low-to-middle level of data standardization were part of the major causes and the shortest plank in the bucket for the containment of the pandemic. With strong coordination by WHO, a global effort to increase interoperability among the healthcare IT systems of different countries will be a fundamental step to get prepared for the next pandemic with an unknown origin.


References and additional information:


See the original publication

About the authors & affiliations


Mengchun Gong ,a,b* 
Yuanshi Jiao, c 
Yang Gong, d and 
Li Liua

a Nanfang Hospital, 
Southern Medical University, Guangzhou, China

b Institute of Health Management
Southern Medical University, Guangzhou, China

c Digital Health China Technologies
Beijing, China

d School of Biomedical Informatics, 
University of Texas Health Science Center at Houston, United States

Total
0
Shares
Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Related Posts

Subscribe

PortugueseSpanishEnglish
Total
0
Share