institute for health transformation (InHealth)
Joaquim Cardoso MSc — Founder and CSO
January 20, 2023
EXECUTIVE SUMMARY
Key Message(s)
- The authors found that the best overall predictive performances were obtained when using training data from the same hospital, which was the winning strategy for 11 (61%) of the 18 participating hospitals
- Using only hospital data can yield better predictive results when compared to adding data from other regions with different population and socioeconomic characteristics.
Abstract
Machine learning algorithms are being increasingly used in healthcare settings but their generalizability between different regions is still unknown.
This study aims to identify the strategy that maximizes the predictive performance of identifying the risk of death by COVID-19 in different regions of a large and unequal country.
- This is a multicenter cohort study with data collected from patients with a positive RT-PCR test for COVID-19 from March to August 2020 (n = 8477) in 18 hospitals, covering all five Brazilian regions.
- Of all patients with a positive RT-PCR test during the period, 2356 (28%) died
- Eight different strategies were used for training and evaluating the performance of three popular machine learning algorithms (extreme gradient boosting, lightGBM, and catboost).
- The strategies ranged from only using training data from a single hospital, up to aggregating patients by their geographic regions.
- The predictive performance of the algorithms was evaluated by the area under the ROC curve (AUROC) on the test set of each hospital.
We found that the best overall predictive performances were obtained when using training data from the same hospital, which was the winning strategy for 11 (61%) of the 18 participating hospitals.
- In this study, the use of more patient data from other regions slightly decreased predictive performance.
- However, models trained in other hospitals still had acceptable performances and could be a solution while data for a specific hospital is being collected.
Discussion
The authors found that the different strategies for training data selection were able to predict COVID-19 mortality with good overall performance, using only routinely-collected data, with an AUROC of 0.7 or higher per strategy, with few exceptions.
- The best overall strategy was training and testing using only the reference hospital data, achieving the highest predictive performance in 11 of the 18 different hospitals.
- In this study, while in some cases adding more data from different hospitals and regions improved predictive performance, in most scenarios it decreased the predictive ability of the algorithms.
- The inclusion of data from other hospitals contributed to training data noise possibly due to heterogeneity in hospital practices12and in most cases deteriorated the predictive performance as seen in other studies13,14, possibly due to different patient demographics, and variable interactions that are not locally reproductible15.
- Other studies that included data from different hospitals and found high predictive performance may have benefited from using data from connected hospitals with similar patients using different techniques or larger samples16,17,18.
- The study is unique in the sense that we analyzed data from 18 independent hospitals from all the five regions of a large and unequal country.
The authors found that the best overall predictive performances were obtained when using training data from the same hospital, which was the winning strategy for 11 (61%) of the 18 participating hospitals.
Infographic
Figure 1
Figure 2
Figure 3
DEEP DIVE
Improving the performance of machine learning algorithms for health outcomes predictions in multicentric cohorts [Brazilian Hospitals]
Nature — Scientific Reports
Roberta Moreira Wichmann, Fernando Timoteo Fernandes, Alexandre Dias Porto Chiavegatto Filho & IACOV-BR Network
Introduction
Around 457 million cases and 6 million deaths have been caused by COVID-19 worldwide by March 20221. Nearly 29 million cases and 654 thousand deaths occurred only in Brazil, ranking third in confirmed cases and deaths. Several machine learning algorithms have been proposed for predicting COVID-19 diagnosis2,3,4and prognosis5,6,7,8, with different input data such as image or laboratorial exams9.
In countries with large socioeconomic inequalities and different access to healthcare and resource heterogeneity10,11, the best strategy for selecting training data for machine learning algorithms is still unknown. While more data may improve the ability of machine learning algorithms to identify detailed pathways linking the predictors to the outcome of interest, it may also introduce noise, as new learned pathways may not be locally replicable.
Also, collecting a large number of variables may be cost prohibitive for some hospitals, and different data collection protocols between hospitals can make this aggregation unfeasible. As the use of machine learning algorithms rapidly advances in healthcare, it will be increasingly important to identify how to improve the generalization of these algorithms in different regions.
In order to identify the best strategy for selecting training data to predict COVID-19 mortality, we gathered data from 18 distinct and independent hospitals (with no direct connections, such as having the same administration or using the same EMR system) from the five regions of Brazil, and tested eight different strategies for developing predictive models, starting with only local hospital data and then seven different approaches of aggregating external training data.
Results
Summary population characteristics
Table 1 presents the descriptive statistics regarding the individual characteristics of the patients. The sample of the study (8477 patients with COVID-19) was mostly comprised by men (55.1%). The most common race was white (62%), although the majority (64.6%) did not provide a self-declared race. Average age was 58.4 years and patients stayed 14 days on average. Patients that died during hospital stay were older (mean age 66.7 vs. 55.2 for survivors) and were more likely to be males (60.0% vs. 53.3% for survivors). List of participants and descritptive statistics for each hospital can be found on Supplementary Tables S1 and S2 respectively.
Table 1 Descriptive statistics of the demographics characteristics of the sample.
Algorithmic performance
Figure 1 shows the results of the AUROCs for the best of the three algorithms for each strategy. Overall, the best predictive performances were obtained when using training data from the same hospital, which was the winning strategy for 11 (61%) of the 18 participating hospitals.
Best AUROCs according to strategy, region and hospital with the best strategy highlighted.
Figure 2 presents the AUROCs of the winning strategy for each hospital, separated by regions. For the southeast region, the most populous region of Brazil and where most of the data was collected, the winning strategy for every hospital was training with only local data. Supplementary Figs. S2 and S3 show recall and specificities from best strategies.
AUROCs of the winning strategy per region. ( a) Southeast, ( b) Northeast, ( c) Midwest, ( d) South, ( e) North.
Table 2 presents a summary of the best algorithm for each strategy. Overall, extreme gradient boosting (XGBoost) was the algorithm that presented the highest number of winning predictive performances regarding AUROCs (67/144, 46.5%), followed closely by Light GBM with 61 (42.4%) and catboost with 16 (11.1%). The list of the final hyperparameters for each algorithm is available in Supplementary Table S3. Calibration for best models are presented in Supplementary Table S4.
Table 2 Algorithm with the best predictive performance per strategy.
Discussion
We found that the different strategies for training data selection were able to predict COVID-19 mortality with good overall performance, using only routinely-collected data, with an AUROC of 0.7 or higher per strategy, with few exceptions.
The best overall strategy was training and testing using only the reference hospital data, achieving the highest predictive performance in 11 of the 18 different hospitals.
In this study, while in some cases adding more data from different hospitals and regions improved predictive performance, in most scenarios it decreased the predictive ability of the algorithms.
The inclusion of data from other hospitals contributed to training data noise possibly due to heterogeneity in hospital practices12and in most cases deteriorated the predictive performance as seen in other studies13,14, possibly due to different patient demographics, and variable interactions that are not locally reproductible15.
Other studies that included data from different hospitals and found high predictive performance may have benefited from using data from connected hospitals with similar patients using different techniques or larger samples16,17,18.
Our study is unique in the sense that we analyzed data from 18 independent hospitals from all the five regions of a large and unequal country.
This study has some limitations that need to be acknowledged.
First, even though we analyzed hospitals from every region of Brazil, they were not equally distributed, with a higher number of patients from the southeast and northeast regions, which are also the most populous.
Another limitation is that as the 18 hospitals were unconnected and independent, there may have been differences on local data collection procedures and sample size that influenced the final results.
Finally, some hospitals had small samples, but were included for aggregating purposes with other regions to check if other strategies improved overall performance.
In conclusion, we found that using only hospital data can yield better predictive results when compared to adding data from other regions with different population and socioeconomic characteristics.
We found that algorithms trained with data from other hospitals frequently decreased local performance even if it considerably increased the training data available. However, models trained with data from other hospitals still presented acceptable performances, and could be an option while data for a specific hospital is still being collected.
Methods
Data source
A cohort of 16,236 patients from 18 distinct hospitals of all regions of Brazil were followed between March and August 2020. The map with the geographic location of participating hospitals is available in Supplementary Fig. S1. We filtered only adult patients (> 18 years) with a positive RT-PCR diagnostic exam for COVID-19, resulting in 8477 patients. Of these, 2356 (28%) died as a result of complications caused by COVID-19. The mortality outcome referred to the current hospital admission for COVID-19, independently of the timeframe. Hospitalization was only analyzed at the time of COVID-19 diagnosis and further hospitalizations of the patient were not included in the study. We used as predictors only variables collected in early hospital admission, i.e. within 24 h before and 24 h after the RT-PCR exam. The full list of hospitals is available in Supplementary Table S1.
A total of 22 predictors were selected among routinely-collected variables in all hospitals, including age, sex, heart rate, respiratory rate, systolic pressure, diastolic pressure, mean pressure, temperature, hemoglobin, platelets, hematocrit, red cells count, mean corpuscular hemoglobin (mch), red cell distribution width (rdw), mean corpuscular volume (mcv), leukocytes, neutrophil, lymphocytes, basophils, eosinophils, monocytes and C-reactive protein.
Figure 3 illustrates the overall process.
Process overview. From inclusion criteria to feature selection.
The study was approved by the Institutional Review Board (IRB) of the University of São Paulo (CAAE: 32872920.4.1001.5421), which included a waiver of informed consent. The data and the partnership with all members of IACOV-BR are included in this approval. The study followed the guidelines of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD)19.
Machine learning techniques
Three popular machine learning models for structured data (lightGBM20catboost21, and extreme gradient boosting22) were trained to predict COVID-19 mortality using routinely-collected data. Eight different strategies were tested to identify the best data selection strategy for each hospital and each of the three algorithms.
Strategies and preprocessing techniques
Initially, we used a single hospital data as the baseline strategy, splitting the data in 70% for training and 30% for testing, with the latter used to predict mortality risk. We then also tested seven different data aggregation strategies to assess the performance of the algorithms with different training data, as presented in Table 3.
Table 3 Clustering strategies for training and testing.
Variables with more than two categories were represented by a set of dummy variables, with one variable for each category. Continuous variables were standardized using the z-score. Variables with a correlation greater than 0.90 were discarded. Variables with more than 90% missing data were also discarded. Remaining variables with missing data were first imputed by the median. We also analyzed the use of the multiple imputation by chained equation (MICE)23technique, but it did not improve the predictive performance of the models (Supplementary Fig. S4). We used K-fold cross-validation with 10 folds with Bayesian optimization (HyperOpt) to select the hyperparameters. Random oversampling was performed in the training set to improve class imbalance while keeping the test set intact24.
To evaluate the performance of the algorithms, we calculated the following metrics for each strategy: accuracy, recall (sensitivity), specificity, positive predictive value (PPV or precision), negative predictive value (NPV) and F1 score. The area under the receiver operating characteristic curve (AUROC) was the main metric used to select the best model among the different scenarios. All the results reported in this study are from the test set. Confidence intervals for AUROC curves were estimated using Delong method for computing the covariance of unadjusted AUC.
Institutional review board statement
The name of the ethics committee is “Comitê de Ética em Pesquisa da Faculdade de Saúde Pública da USP”. All the study protocol was approved by this Committee following all methods in accordance with the relevant guidelines and regulations. The approval date of the project was June 2020.
References & additional information
See the original publication
Acknowledgements
This work was supported by National Council for Scientific and Technological Development (CNPq) under Grant Number 402626/2020–6, and Microsoft (Microsoft AI for Health COVID-19 Grant). We would like to thank the IACOV-BR Network, in alphabetic order: Ana Claudia Martins Ciconelle ( Institute of Mathematics and Statistics, University of São Paulo); Ana Maria Espírito Santo de Brito ( Instituto de Medicina, Estudos e Desenvolvimento-IMED, São Paulo, São Paulo); Bruno Pereira Nunes (Universidade Federal de Pelotas-UFPel); Dárcia Lima e Silva ( Hospital Santa Lúcia); Fernando Anschau ( Setor de Pesquisa da Gerência de Ensino e Pesquisa do Grupo Hospitalar Conceição, RS — Brasil; Programa de Pós-Graduação em Neurociências da Universidade Federal do Rio Grande do Sul); Henrique de Castro Rodrigues ( Serviço de Epidemiologia e Avaliação/Direção Geral do HUCFF/UFRJ); Hermano Alexandre Lima Rocha (Unimed Fortaleza. Fortaleza, Ceará, Brasil; Departamento de Saúde Comunitária. Universidade Federal do Ceará. Fortaleza, Ceará, Brasil); João Conrado Bueno dos Reis (Hospital São Francisco); Liane de Oliveira Cavalcante (Hospital Santa Julia de Manaus); Liszt Palmeira de Oliveira (Instituto Unimed-Rio; Universidade do Estado do Rio de Janeiro); Lorena Sofia dos Santos Andrade (Universidade de Pernambuco-UPE/UEPB); Luiz Antonio Nasi (Hospital Moinhos de Vento); Marcelo de Maria Felix (InRad-Institute of Radiology, School of Medicine, University of São Paulo); Marcelo Jenne Mimica (Departamento de Ciências Patológicas Faculdade de Ciências Médicas da Santa Casa de São Paulo); Maria Elizete de Almeida Araujo (Federal University of Amazonas, University Hospital Getulio Vargas, Manaus, AM, Brazil); Mariana Volpe Arnoni (Serviço de Controle de Infecção Hospitalar Santa Casa de São Paulo); Rebeca Baiocchi Vianna (Hospital Santa Lúcia); Renan Magalhães Montenegro Junior (Complexo Hospitalar da Universidade Federal do Ceará — EBSERH); Renata Vicente da Penha ( Hospital Evangélico de Vila Velha); Rogério Nadin Vicente (Hospital Santa Catarina de Blumenau); Ruchelli França de Lima (Hospital Moinhos de Vento); Sandro Rodrigues Batista (Faculdade de Medicina, Universidade Federal de Goiás, Goiânia, Goiás; Secretaria de Estado da Saúde de Goiás, Goiânia, Goiás); Silvia Ferreira Nunes (Fundação Santa Casa de Misericórdia do Pará-FSCMP; Mestrado Profissional em Gestão e Saúde na Amazônia); Tássia Teles Santana de Macedo ( Escola Bahiana de Medicina e Saúde Pública); Valesca Lôbo e Sant’ana Nuno (Hospital Português da Bahia). We would also like to thank all those people who somehow contributed to the progress of this research, in alphabetical order: Adriana Weinfeld Massaia; Alexandre Amaral; Ana Maria Pereira Rangel; Antônia Célia de Castro Alcantara; Bruna Donida; Bruno Mendes Carmon; Carisi Polanczyk; Carolina Zenilda Nicolao; Claiton Marques de Jesus; Denise Corrêa Nunes; Diana Almeida; Eduardo Menezes Lopes; Elias Bezerra Leite; Elimar Ponzzo Dutra Leal; Fernanda Arns de Castro; Fernanda Colares de Borba Netto; Flávia Araújo; Flávio Lúcio Pontes Ibiapina; Gerência de Ensino e pesquisa do Complexo Hospitalar da Universidade Federal do Ceará — EBSERH; Hospital Português da Bahia; Humberto Bolognini Tridapalli; Iasmin Luiza Leite; Laura Freitas de Faveri; Lena Claudia Maia Alencar; Luciane Kopittke; Luciano Hammes; Luiz Alberto Mattos; Marly Suzielly Miranda Silva; Mayara Rocha de Oliveira; Mohamed Parrini; Pablo Viana Stolz; Paloma Farina de Lima; Paulo Pitrez; Pollyana Bueno Siqueira; Rafaella Côrti Pessigatti; Raul José de Abreu Sturari Junior; Rodrigo Smania Garrastazu Almeida; Rogério Farias Bitencourt; Rubens Vasconcelos Barreto; Tatiane Lima Aguiar; Thyago Gregório Mota Ribeiro.
Authors and Affiliations
School of Public Health, University of São Paulo, São Paulo, SP, Brazil
Roberta Moreira Wichmann, Fernando Timoteo Fernandes & Alexandre Dias Porto Chiavegatto Filho
Brazilian Institute of Education, Development and Research-IDP, Economics Graduate Program, Brasilia, DF, Brazil
Roberta Moreira Wichmann
Fundacentro, São Paulo, SP, Brazil
Fernando Timoteo Fernandes
Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil
Ana Claudia Martins Ciconelle
Instituto de Medicina, Estudos e Desenvolvimento-IMED, São Paulo, Brazil
Ana Maria Espírito Santo de Brito
Universidade Federal de Pelotas-UFPel, Pelotas, Brazil
Bruno Pereira Nunes
Hospital Santa Lúcia, Divinópolis, Brazil
Dárcia Lima e Silva & Rebeca Baiocchi Vianna
Setor de Pesquisa da Gerência de Ensino e Pesquisa do Grupo Hospitalar Conceição, Porto Alegre, RS, Brazil
Fernando Anschau
Programa de Pós-Graduação em Neurociências da Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Fernando Anschau
Serviço de Epidemiologia e Avaliação/Direção Geral do HUCFF/UFRJ, Rio de Janeiro, Brazil
Henrique de Castro Rodrigues
Unimed Fortaleza, Fortaleza, Ceará, Brazil
Hermano Alexandre Lima Rocha
Departamento de Saúde Comunitária, Universidade Federal do Ceará, Fortaleza, Ceará, Brazil
Hermano Alexandre Lima Rocha
Hospital São Francisco, Brasília, Brazil
João Conrado Bueno dos Reis
Hospital Santa Julia de Manaus, Manaus, Brazil
Liane de Oliveira Cavalcante
Instituto Unimed-Rio, Universidade do Estado do Rio de Janeiro, Rio de Janeiro, Brazil
Liszt Palmeira de Oliveira
Universidade de Pernambuco-UPE/UEPB, Recife, Brazil
Lorena Sofia dos Santos Andrade
Hospital Moinhos de Vento, Porto Alegre, Brazil
Luiz Antonio Nasi & Ruchelli França de Lima
InRad-Institute of Radiology, School of Medicine, University of São Paulo,São Paulo, Brazil
Marcelo de Maria Felix
Departamento de Ciências Patológicas, Faculdade de Ciências Médicas da Santa Casa de São Paulo, São Paulo, Brazil
Marcelo Jenne Mimica
Federal University of Amazonas, University Hospital Getulio Vargas, Manaus, AM, Brazil
Maria Elizete de Almeida Araujo
Serviço de Controle de Infecção Hospitalar Santa Casa de São Paulo, São Paulo, Brazil
Mariana Volpe Arnoni
Complexo Hospitalar da Universidade Federal do Ceará-EBSERH, Fortaleza, Brazil
Renan Magalhães Montenegro Junior
Hospital Evangélico de Vila Velha, Vila Velha, Brazil
Renata Vicente da Penha
Hospital Santa Catarina de Blumenau, Blumenau, Brazil
Rogério Nadin Vicente
Faculdade de Medicina, Universidade Federal de Goiás, Goiânia, Goiás, Brazil
Sandro Rodrigues Batista
Secretaria de Estado da Saúde de Goiás, Goiânia, Goiás, Brazil
Sandro Rodrigues Batista
Fundação Santa Casa de Misericórdia do Pará-FSCMP, Belém, Brazil
Silvia Ferreira Nunes
Mestrado Profissional em Gestão e Saúde na Amazônia, Belém, Brazil
Silvia Ferreira Nunes
Escola Bahiana de Medicina e Saúde Pública, Salvador, Brazil
Tássia Teles Santana de Macedo
Hospital Português da Bahia, Salvador, Brazil
Valesca Lôbo eSant’ana Nuno
Consortia
IACOV-BR Network
- Ana Claudia Martins Ciconelle
- , Ana Maria Espírito Santo de Brito
- , Bruno Pereira Nunes
- , Dárcia Lima e Silva
- , Fernando Anschau
- , Henrique de Castro Rodrigues
- , Hermano Alexandre Lima Rocha
- , João Conrado Bueno dos Reis
- , Liane de Oliveira Cavalcante
- , Liszt Palmeira de Oliveira
- , Lorena Sofia dos Santos Andrade
- , Luiz Antonio Nasi
- , Marcelo de Maria Felix
- , Marcelo Jenne Mimica
- , Maria Elizete de Almeida Araujo
- , Mariana Volpe Arnoni
- , Rebeca Baiocchi Vianna
- , Renan Magalhães Montenegro Junior
- , Renata Vicente da Penha
- , Rogério Nadin Vicente
- , Ruchelli França de Lima
- , Sandro Rodrigues Batista
- , Silvia Ferreira Nunes
- , Tássia Teles Santana de Macedo
- & Valesca Lôbo eSant’ana Nuno
Originally published at https://www.nature.com on January 19, 2023.
Cite this article
Wichmann, R.M., Fernandes, F.T., Chiavegatto Filho, A.D.P. et al. Improving the performance of machine learning algorithms for health outcomes predictions in multicentric cohorts. Sci Rep 13, 1022 (2023). https://doi.org/10.1038/s41598-022-26467-6