inhealth
institute for continuous health transformation
Joaquim Cardoso MSc
Founder, CEO and Senior Advisor
January 23, 2023
EXECUTIVE SUMMARY
A study by a research team at MIT’s Jameel Clinic has found that AI can predict lung cancer risk six years into the future using a single CT scan.
- The team used deep learning to train a model called Sybil, using LDCTs from the National Lung Screening Trial (NLST).
- They validated the model using more than 6,000 NLST scans that they excluded from the training dataset, as well as on almost 9,000 scans from Massachusetts General Hospital (MGH) and more than 12,000 scans from Chang Gung Memorial Hospital (CGMH) in Taiwan.
- The CGMH LDCTs included scans of non-smokers, while the other datasets were limited to people who smoked.
- Sybil can accurately predict an individual’s future lung cancer risk from a single LDCT scan to further enable personalized screening.
On an accuracy measure that puts 0.5 as random and 1.0 as perfect, Sybil’s lung cancer prediction in the year after the scan ranged from 0.86 to 0.94 across the three validation datasets.
- The team used data from the National Lung Screening Trial to train the algorithm, and was able to predict cancer within one year with an accuracy of up to 94%.
- The algorithm could be used in the future to decrease follow-up scans or biopsies among patients with nodules that are low risk.
- The team has deployed the model, which they have made freely available, at clinics for research use to validate it in subpopulations that were underrepresented in the original data and to show how it fits into day-to-day clinical activities.
- The researchers plan to further develop the model and test it on underrepresented groups such as African-American patients and those of Hispanic/Latinx descent.
- Future study is required to understand Sybil’s clinical applications.
INFOGRAPHIC
ORIGINAL PUBLICATION
AI predicts lung cancer risk six years into the future using single CT scan: study
MedTech Dive
Nick Paul Taylor
Jan. 17, 2023
LDCT is an effective lung-cancer screening tool and the U.S. Preventive Services Task Force recommends annual scans for people aged 50 years and older with a 20-pack-a-year history of smoking.
However, only a fraction of the target population are screened as per the recommendations, creating a need for ways to increase the rate and thereby enable timely treatment of more people.
A research team at MIT’s Jameel Clinic identified deep learning as a way to tackle the problem and worked with collaborators at other institutions to train and validate a model.
Professor Regina Barzilay, MIT Jameel Clinic’s AI faculty lead, explained the process in an email to MedTech Dive.
“When training, the model is given these LDCTs and it tries to predict whether or not each participant develops cancer in the next year, in the next two years, …, or in the next six years.
From the mistakes the model makes, it learns to refine its predictions and improve its accuracy.
In this way, the model is not told to look at anything specific; instead, it learns the visual cues that are predictive of future lung cancer by looking at a large set of such imaging data,” Barzilay said.
Barzilay and her collaborators trained the model, which they call Sybil, using LDCTs from the National Lung Screening Trial (NLST).
The team validated the model using more than 6,000 NLST scans that they excluded from the training dataset, as well as on almost 9,000 scans from Massachusetts General Hospital (MGH) and more than 12,000 scans from Chang Gung Memorial Hospital (CGMH) in Taiwan.
The CGMH LDCTs included scans of nonsmokers, while the other datasets were limited to people who smoked.
On an accuracy measure that puts 0.5 as random and 1.0 as perfect, Sybil’s lung cancer prediction in the year after the scan ranged from 0.86 to 0.94 across the three validation datasets.
Accuracy fell further out from the scan, sliding by year five to 0.78 in the MGH data and by year six to 0.75 the NLST data and 0.74 the CGMH data.
On an accuracy measure that puts 0.5 as random and 1.0 as perfect, Sybil’s lung cancer prediction in the year after the scan ranged from 0.86 to 0.94 across the three validation datasets.
The results encouraged the researchers to keep developing the model.
Current goals include
- showing how the model performs in African-American patients and
- those of Hispanic/Latinx descent,
groups that were underrepresented in the validation datasets, and how it generalizes to nonsmokers and can be useful to radiologists.
“To answer all of these questions, Sybil is being deployed in different clinics solely for research purposes at the moment,” Barzilay said.
“Additionally, a major aim of our research was to make the model publicly available, so there is no intention to commercialize it.
The tool will only be used within a research framework for now.
It will require regulation if it’s to be used as a medical device, but the regulation of medical AI is also an interesting and rapidly evolving space.”
Originally published at: https://www.medtechdive.com
REFERENCE PUBLICATION
Sybil: A Validated Deep Learning Model to Predict Future Lung Cancer Risk From a Single Low-Dose Chest Computed Tomography
Journal of Clinical Oncology
Peter G. Mikhael , BSc1,2; Jeremy Wohlwend, ME1,2; Adam Yala , PhD1,2; Ludvig Karstens , MSc1,2; Justin Xiang, ME1,2; Angelo K. Takigami , MD3,4; Patrick P. Bourgouin , MD3,4; PuiYee Chan , PhD5; Sofiane Mrah , MSc4; Wael Amayri, BSc4; Yu-Hsiang Juan, MD6,7; Cheng-Ta Yang, MD6,8; Yung-Liang Wan , MD6,7; Gigin Lin , MD, PhD6,7; Lecia V. Sequist , MD, MPH3,5; Florian J. Fintelmann , MD3,4; and Regina Barzilay, PhD1,2
ABSTRACT
Purpose
- Low-dose computed tomography (LDCT) for lung cancer screening is effective, although most eligible people are not being screened.
- Tools that provide personalized future cancer risk assessment could focus approaches toward those most likely to benefit.
- We hypothesized that a deep learning model assessing the entire volumetric LDCT data could be built to predict individual risk without requiring additional demographic or clinical data.
Methods
- We developed a model called Sybil using LDCTs from the National Lung Screening Trial (NLST).
- Sybil requires only one LDCT and does not require clinical data or radiologist annotations; it can run in real time in the background on a radiology reading station.
- Sybil was validated on three independent data sets: a heldout set of 6,282 LDCTs from NLST participants, 8,821 LDCTs from Massachusetts General Hospital (MGH), and 12,280 LDCTs from Chang Gung Memorial Hospital (CGMH, which included people with a range of smoking history including nonsmokers).
Results
- Sybil achieved area under the receiver-operator curves for lung cancer prediction at 1 year of 0.92 (95% CI, 0.88 to 0.95) on NLST, 0.86 (95% CI, 0.82 to 0.90) on MGH, and 0.94 (95% CI, 0.91 to 1.00) on CGMH external validation sets.
- Concordance indices over 6 years were 0.75 (95% CI, 0.72 to 0.78), 0.81 (95% CI, 0.77 to 0.85), and 0.80 (95% CI, 0.75 to 0.86) for NLST, MGH, and CGMH, respectively.
Conclusion
- Sybil can accurately predict an individual’s future lung cancer risk from a single LDCT scan to further enable personalized screening.
- Future study is required to understand Sybil’s clinical applications.
- Our model and annotations are publicly available.
INTRODUCTION
Two large randomized controlled trials have established the efficacy of lung cancer screening (LCS) using low-dose computed tomography (LDCT) in cigarette smokers, with 20% and 24% decreases in lung cancer mortality in the National Lung Screening Trial (NLST) and the NELSON trial, respectively.1
Hence, the US Preventive Services Task Force recommends annual LDCTs for those age 50 years and older with a 20 pack-year history of smoking.2
There are currently major shortcomings in achieving appropriate LCS.
For instance, in the United States, a dismal < 10% of the eligible population is being screened.3–5
For instance, in the United States, a dismal < 10% of the eligible population is being screened.3–5
Evidence also suggests those being screened are not being optimally routed to follow-up or kept engaged in long-term screening.6–8
In parallel, lung cancer diagnoses among never- and lighter-smokers are rapidly rising,9,10 suggesting that if we continue to focus research about LCS only on heavier smokers, a gap will persist between the screen population and the disease population.
CONTEXT
Key Objective
- Individualized risk models for lung cancer prediction can improve screening practices, but current models require a combination of demographic information, clinical risk factors, and radiologic annotations.
- Using data from National Lung Screening Trial, this study describes the development of a deep learning cancer risk model, Sybil, that uses a single low-dose chest computed tomography (CT) scan to predict lung cancers occurring 1–6 years after a screen.
- Sybil’s performance without image annotation and demographic or clinical data is then evaluated on modern and independent test sets from Massachusetts General Hospital and Chang Gung Memorial Hospital, Taiwan.
Knowledge Generated
- Sybil was able to forecast both short-term and long-term lung cancer risk on the National Lung Screening Trial test set.
- Using low-dose chest CT lung screening scans collected over the past 15 years, Sybil maintained its accuracy across diverse sets of patients from the United States and Taiwan. The code is publicly available.
- The preliminary results of this study suggest the program can provide additional information about the future lung cancer risk in patients undergoing CT lung cancer screening with minimal disruption in the normal clinical workflow.
- Further evaluation in a prospective study to assess the performance and clinical benefit is warranted.
One strategy that could help address these disparate LCS obstacles is to improve the efficiency and benefits of LCS by individualizing assessment of future lung cancer risk.
Past efforts to improve LCS rates have focused on identifying those at the highest risk for lung cancer and directing available resources to screen them.
To that end, significant progress has been made using clinical and demographic variables as well as chest radiographs to model lung cancer risk among smokers, and an ongoing clinical trial is examining the utility of one such clinical model (PLCOm2012) to select patients for LDCT screening.11–16
Once patients have started LCS, determining follow-up imaging frequency relies primarily on visible pulmonary nodule assessment.17
Ardila et al18 leveraged LDCTs from the NLST to develop a cancer detection algorithm that identifies pulmonary nodules, processes the region surrounding a visible nodule using deep learning, and accurately predicts lung cancer within 1 and 2 years.
Others showed improved risk predictions when combining PLCOm2012 with outcomes from the last three screens, but did not leverage image data directly.18,19
In more recent work, Robbins et al20 used risk factors and image-based features to recommend personalized screening intervals.
We hypothesize that LDCT images contain information that is predictive of future lung cancer risk beyond currently identifiable features such as lung nodules.
An algorithm that goes past visible nodules to predicting future lung cancer risk over several years could further enhance patient management and LCS implementation strategies.
Therefore, we aimed to develop and validate a deep learning algorithm that predicts future lung cancer risk out to 6 years from a single LDCT scan, and assess its potential clinical impact.
MATERIALS AND METHODS
See the original publication (this is an excerpt version only)
RESULTS
See the original publication (this is an excerpt version only)
DISCUSSION
We developed Sybil, a deep learning algorithm that predicts future lung cancer risk out to 6 years from a single LDCT scan. Sybil can run in the background at a radiology reading station as soon as LDCT images are available, without inputting demographic or other clinical data and without requiring radiologists to annotate areas of interest. Trained on data from the NLST, Sybil was able to predict cancer within 1 year with AUCs of 0.92 (95% CI, 0.88 to 0.95) on a heldout NLST test set, and 0.86 (95% CI, 0.82 to 0.90) and 0.94 (95% CI, 0.91 to 1.00) on the MGH and CGMH independent external validation sets, respectively. The 6-year C-index was 0.75 (95% CI, 0.72 to 0.78), 0.81 (95% CI, 0.77 to 0.85), and 0.80 (95% CI, 0.75 to 0.86) for Sybil on the NLST, MGH, and CGMH sets, respectively.
Sybil’s assessment may not correspond to how a human radiologist would approach image analysis. We sought to gain insight into the visual characteristics that Sybil might consider in making predictions. We noted an association between Sybil’s ability to correctly lateralize the location of future cancers and the likelihood that an LDCT receives a high-risk score (Appendix Table A6, online only), indicating that when Sybil predicts high future lung cancer risk, the signal it uses localizes to specific at-risk regions rather than being equally spread over the entire thorax. We also found that traditional clinical risk factors such as smoking duration can be predicted directly from the LDCT images (Appendix Fig A2, online only, Appendix Table A7, online only), suggesting that Sybil may also infer biologically relevant information from LDCT images. To distinguish between cancer detection and future cancer risk, we removed visible lung nodules that were known to be cancerous from the analysis set. We found that Sybil’s performance was lower on this set but still possessed predictive power.
As is standard practice, we sought to compare Sybil with other models used for lung cancer risk prediction. However, although several models have been developed to improve LCS and detection, none are valid comparisons to Sybil as they differ in goal, scope, data input, and code availability (Table 2). Many models require either clinical data, manual identification and characterization of nodules, multiple LDCTs, or the Lung-RADS assessment of a radiologist. In general, the models can be divided into those that predict risk before a scan has been performed and can be used to steer high-risk patients toward screening, and those that predict risk after a scan has been performed and use data from the scan (either images or descriptions of images) as model input. The two most similar models to Sybil are likely the two that are post-LDCT and analyze the CT images themselves to predict risk, namely, the models published by Ardila and by Huang. However, they are limited in the number of years to cancer incidence that they predict. Additionally, we could not implement either of these models to test head-to-head against Sybil for short-term cancer risk prediction because their code bases were not made public.
On the basis of our initial results, one potential clinical application is to use Sybil to decrease follow-up scans or biopsies among patients with nodules that are low risk. Indeed, increasing the specificity of LDCT screening was a key advantage of the Lung-RADS system compared with the nodule assessment algorithm used in the NLST study, and underlies its adoption as the gold standard in the United States. In our assessment of the NLST test set, Sybil further reduced the FPR to 8% for baseline scans, compared with 14% for Lung-RADS 1.0, while maintaining equivalent sensitivity. In addition to false positives, false negatives or missed interval cancers among patients engaged in LCS programs are a major concern for both medical and legal reasons. NLST investigators examined the 44 missed interval lung cancers in the NLST and found, upon retrospective review, most missed cases could have potentially been avoided but for human error.34 Although anecdotal, the cases discussed in Figure 3 similarly spark contemplation about whether Sybil could be harnessed to decrease follow-up intervals or increase prioritization by the patient navigator and other tools to ensure those at highest risk are followed most closely. The benefit of such interventions will require confirmation in prospective clinical trials.
Before Sybil can be studied prospectively, the first step is to gain confidence that it is generalizable. Sybil was developed using scans from the NLST, which were obtained in 2002–2004 from US patients who were overwhelmingly White (92%). Changes in CT technology over time might adversely affect Sybil’s translation, hence we chose more modern cohorts for independent validation. Differences in image slice thickness over time were noted, although we had already excluded scans with images thicker than 2.5 mm from the initial Sybil build. Despite technological changes, Sybil generalized well across these modern and diverse validation cohorts. Notably in CGMH, Sybil maintained its performance in a population that likely consists of a plurality of nonsmokers. However, none of the cohorts presented here include sufficient Black or Hispanic patients to have confidence in broad applicability yet.
There are several limitations to this study. In addition to the aforementioned lack of a true comparator model and suboptimal population diversity to date, the work presented here is solely retrospective. As the cohorts we studied consisted of subjects engaged in LCS, we cannot assess Sybil’s ability to detect cancers presenting independently from a screening program. Importantly, we do not have access to detailed smoking data from CGMH subjects, so conclusions about Sybil’s ability to predict lung cancer from images in nonsmokers remain speculative. Although the CGMH cohort likely consists mostly of nonsmokers, the lung cancer incidence in Taiwan among nonsmokers is also significantly higher than most countries.24 Top priorities for next steps are understanding whether Sybil might facilitate LCS research into populations outside the current US Preventive Services Task Force criteria and which strategies are optimal to incorporate Sybil’s risk predictions into real-world LCS patient management and decision making.35 Like all artificial intelligence tools being developed for health care application, careful and transparent development of Sybil including critical assessment of shortcomings will be necessary.
To facilitate Sybil’s use and promote further research into clinical applications of this model, the algorithm is publicly available along with the image annotations generated on the NLST dataset.
References & additional information
See the original publication
About the authors & affiliations
Peter G. Mikhael , BSc1,2; Jeremy Wohlwend, ME1,2; Adam Yala , PhD1,2; Ludvig Karstens , MSc1,2; Justin Xiang, ME1,2; Angelo K. Takigami , MD3,4; Patrick P. Bourgouin , MD3,4; PuiYee Chan , PhD5; Sofiane Mrah , MSc4; Wael Amayri, BSc4; Yu-Hsiang Juan, MD6,7; Cheng-Ta Yang, MD6,8; Yung-Liang Wan , MD6,7; Gigin Lin , MD, PhD6,7; Lecia V. Sequist , MD, MPH3,5; Florian J. Fintelmann , MD3,4; and Regina Barzilay, PhD1,2
1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
2Jameel Clinic, Massachusetts Institute of Technology, Cambridge, MA
3Harvard Medical School, Boston, MA
4Department of Radiology, Massachusetts General Hospital, Boston, MA
5Department of Medicine, Massachusetts General Hospital, Boston, MA
6Chang Gung University, Taoyuan, Taiwan
7Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital, Taoyuan, Taiwan
8Department of Thoracic Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan
Originally published at https://ascopubs.org
RELATED ARTICLES
https://news.mit.edu/2023/ai-model-can-detect-future-lung-cancer-0120