MedRxiv
Jonathan H. Lu1 , Alison Callahan1 *, Birju S. Patel1 *, Keith E. Morse2,3*, Dev Dash1 , Nigam H. Shah
July 23, 2021
makrif
ABSTRACT
Objective
To assess whether the documentation available for commonly used machine learning models developed by an electronic health record (EHR) vendor provides information requested by model reporting guidelines.
Materials and Methods
We identified items requested for reporting from model reporting guidelines published in computer science, biomedical informatics, and clinical journals, and merged similar items into representative “atoms”.
Four independent reviewers and one adjudicator assessed the degree to which model documentation for 12 models developed by Epic Systems reported the details requested in each atom.
We present summary statistics of consensus, interrater agreement, and reporting rates of all atoms for the 12 models.
Results
We identified 220 unique atoms across 15 model reporting guidelines.
After examining the documentation for the 12 most commonly used Epic models, the independent reviewers had an interrater agreement of 76%.
After adjudication, the model documentations’ median completion rate of applicable atoms was 39% (range: 31%-47%).
Most of the commonly requested atoms had reporting rates of 90% or above, including atoms concerning outcome definition, preprocessing, AUROC, internal validation and intended clinical use.
For individual reporting guidelines, the median adherence rate for an entire guideline was 54% (range: 15%-71%).
Atoms reported half the time or less included those relating to fairness (summary statistics and subgroup analyses, including for age, race/ethnicity, or sex), usefulness (net benefit, prediction time, warnings on out-of-scope use and when to stop use), and transparency (model coefficients).
Atoms relating to reliability also had low reporting, including those related to missingness (missing data statistics, missingness strategy), validation (calibration plot, external validation), and monitoring (how models are updated/tuned, prediction monitoring).
Conclusion
There are many recommendations about what should be reported about predictive models used to guide care.
Existing model documentation examined in this study provides less than half of applicable atoms, and entire reporting guidelines have low adherence rates.
Half or less of the reviewed documentation reported information related to usefulness, reliability, transparency and fairness of models.
There is a need for better operationalization of reporting recommendations for predictive models in healthcare.
Half or less of the reviewed documentation reported information related to usefulness, reliability, transparency and fairness of models.
There is a need for better operationalization of reporting recommendations for predictive models in healthcare.
Originally published at https://www.medrxiv.org on July 23, 2021.
About the authors
Jonathan H. Lu1 ,
Alison Callahan1 *,
Birju S. Patel1 *,
Keith E. Morse2,3*,
Dev Dash1 ,
Nigam H. Shah
1 Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA
2 Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
3 Department of Clinical Informatics, Lucile Packard Children’s Hospital, Palo Alto, CA, USA
* These authors contributed equally.