What is Big Data, and What are the Main Applications in Oncology? Lessons from the Netherlands

The Health Strategist
September, 23, 2021
Edited by Joaquim Cardoso, MSc
Image: National Cancer Institute

This is an excerpt from the paper “The potential use of big data in oncology”, authored by “ Willems SM, Abeln S, Feenstra KA, de Bree R, van der Poel EF, Baatenburg de Jong RJ, Heringa J, van den Brekel MWM, focusing on the topic above.

Authors of the original paper:

Stefan M. Willemsa,b,⁎; 
Sanne Abelnc; 
K. Anton Feenstrac;
Remco de Breed;
Egge F. van der Poele;
Robert J. Baatenburg de Jonge;
Jaap Heringac;
Michiel W.M van den Brekelf

Edited by: Joaquim Cardoso, MSc


Big data and the computer technology to analyze it are called one of the top 10 revolutions in the coming decade[3]. It is foreseen that its impact parallels that of the Internet, the cloud, and, more recently, block-chains (known from crypto-currencies as the bitcoin)[4]. Big data phenomena are penetrating in virtually all sectors. On large scale, they have been first applied by information power companies (IBM, Google, Facebook, Amazon).

Algorithms, using neural networks and machine-learning techniques have been developed and are used by these large IT-oriented companies to predict behavior of people and use this information for person-oriented marketing. Also health insurance companies and governments have large interest in big data developments and big data have entered life sciences too. But what is big data and what can we do with it?[5].


Though many people and companies use the word “big data”, they may not always mean the same, or interpret it in the same way. Most of us have a vague notion of what it could be (“anything that won’t fit an excel sheet”), but big data is not just synonymous to “a lot of data”.

A way to define big data in health care, is its description according to the 5 V’s (https://www.ibm.com/blogs/watson-health/the-5-vs-of-big-data The 5 Vs of Big data, September 17, 2016 Anil Jain). From this definition, big data contain:

  • Volume: big data are of big size, containing a lot of data points/ records of multiple subjects. These include diagnostic work-up [clinical, radiological, pathological), treatment data (surgery, systemic therapy, radiotherapy and their combinations), response data and complications.
  • Velocity: big data has two velocity aspects: [1] big data are created at an increasingly high speed, and [2] they have to be computed/ digested relatively fast. Worldwide the incidence of cancer is increasing, while patients live longer. Together with the technological advances and monitoring devices, an increasing number of data will have to be processed within the same time.
  • Variety: big data comprise a huge variability of data types. This variety has important opportunities (many different data types enrich the quality and usefulness of it), and challenges regarding its heterogeneity warranting standardization (synoptic reporting).
  • Variability: it’s crucial to realize that data capturing varies in place and time. Capturing a (predefined) mandatory minimal dataset is a prerequisite to get most of (synoptic) reported data. This doesn’t only need consensus on the minimal data required; it also involves univocal definitions (e.g. recurrence vs residual disease).
  • Value: setting up a data infrastructure to collect and interpret data is only worthwhile, when it enables generation of data-derived conclusions or measurements based on accurate data that can really lead to measurable improvements or impact in health care.

While the sheer size of data collections is often an issue, an even more pressing problem is that data resources are typically spatially distributed across the globe and deposited in ways that make it difficult to integrate the data. This may lead to scalability problems since the data need to be transported via the internet, but, more importantly, requires harmonization and standardization efforts if the data is to be integrated and used in a common workflow to answer an overarching query.


Big data are practical useful in various areas. Location tracking helps logistic companies to mitigate risks in transport, speed and reliability of delivery. In the financial world, Securities Exchange Commission (SEC) uses big data network analytics to identify possible frauds. Entertainment companies such as Netflix and YouTube use past views and online behavior to increase engagement and drive more revenues. Advertising companies are probably the biggest big data players. Data analyzed from Facebook, Twitter and Google monitor behavior and transactions that advertisers use to run targeted campaigns. Breeding companies use drones flying over the crop fields, sending back imaging data to inform the breeding process.

With big data, hospitals can improve monitoring of intensive care patients. Efficiency of (expensive) medication can be measured and epidemic outbreaks can be forecasted in an early stage.

More specifically for the medical disciplines, big data could be helpful in developing and reshaping disease prevention strategies. Combining large data sets of genomics and environmental data will help to predict which individuals/groups are at risk for developing certain (chronic) diseases and cancer. 
This might elicit specific actions aimed to influence the environmental factors and behavior that contribute to health risks in target groups.

Big data will also be helpful to evaluate current prevention programs and might help to identify novel insights to improve these. Also in a therapeutic setting, big data are instrumental to monitor e.g. the effects of specific therapies, such as those of expensive oncolytics, especially in relation to patient and tumor (genetic) characteristics. This will help to improve precision medicine and fuel important knowledge to calculate cost-efficiency of certain treatment regiments.


The future potential of big data (in biomedical research) is not fully clear yet. For today (and tomorrow), big data will create value for

  1. daily diagnostics,
  2. quality of care/life (including PROMs and PREMs) and
  3. biomedical research[8] [7].

We will give some examples of currently available applications.

  • Daily diagnostics
  • Quality of care measurements
  • Biomedical research
  • Personalized medicine

Daily diagnostics

Big data can already have relevance in every day clinical practice. An example is the near-real time access Dutch pathologists have to the nationwide histopathological follow up of each individual patient.

The PALGA foundation governs all digital histopathological records in The Netherlands ever since 1971 (www.palga.nl). Containing over 72 million records of over 12 million patients in The Netherlands, PALGA is one of the largest biomedical databases in the world and covers all 55 pathology labs in The Netherlands. Every time a Dutch pathologist authorizes a histopathology report, one copy is stored in the local hospital information system, and one copy in the central PALGA database.

So, this database contains real time pathological follow up of each patient that is directly visible for each PALGA member (pathologist or molecular biologist). This offers huge potential in recognizing relevant patient (oncological) history, e.g. ruling out a recent malignancy in cases of a suspect tumor of unknown primary; or offering pathological documentation on previous relevant pathological features (such as resection margins and positive lymph node) in case pathology was performed in another lab. Also co-occurrence of diseases or unknown associations in low prevalent disease, that at first sight seem not to be related can be studied using this database[9].

Electronic patient files generate an enormous amount of medical data, which can be used for prognostic modeling. 
 One of the first prognostic models for HNC patients receiving care at medical centers in developed countries is available online at www.oncologiq.nl[10]
 Automatization of statistical prognostication processes allows automatic updating of models when new data is gathered[11]
 These data can also be used to develop clinical decision making tools for improved patient counseling and non-binary patient related outcome measurements.

Quality of care measurements

Linking databases on patient outcomes with data on patient characteristics and treatment can offer unprecedented potential for feeding back quality an efficacy of care.

Recently, a French study showed the landscape of molecular testing for targeted therapy in non-small cell lung cancer (NSCLC) in France and subsequent treatment regiments based on this[12].

This allows direct feedback on optimal test-treatment correlations. More importantly, it might be a strong incentive for underperforming labs, to revise their protocol/workflow to improve their optimum of care.

Also in The Netherlands, linking data from the national cancer registry (containing clinical stage, treatment and outcome data) with the aforementioned PALGA database, has been able the show the variety in clinical care in head and neck cancer in The Netherlands[13] [14].

Though improving the quality of care can only be reached by transparency on such data, it should be realized that feedback of such data, especially outcome data and when benchmarked, can only be done with indisputable prudency as labs and hospitals might fear reputation damage or naming-and-shaming[15].

In practice, when published anonymously to the public and fed back disclosed only on the individual level, experience learns that most hospitals are actually happy to cooperate in such mirror feedback.

This has led in The Netherlands to the development of algorithms for automatic feedback of pathology and treatment related items on a regular basis, such as the Dutch Institute for Clinical Auditing (www.dica.nl).

Mirror information showing higher recurrence rates than those in peer hospitals, possible will be an incentive to zoom in on the underlying chain to identify (and solve) potential weaknesses.

Biomedical research

Probably most benefit will be generated from big data in the field of research. 
 The leading era of “genome wide association studies” (GWAS) has been broadening towards an era of “data wide association studies” (DWAS), with a central place for big data. Increase of data, both due to increased used of imaging and molecular analyses and combinations with other data, offer a matchless Walhalla for each data scientist and bioinformatician.

Big data fill an unmet need in biomedical research. For example, an important limitation of today’s medicine is our poor understanding of the biology of disease.

Only by aggregating huge amounts of big data, all relevant multisource variables, such as DNA, RNA, protein and metabolomics data will aggregate and can be integrated in more realistic models to predict how tumors will behave and which patients will benefit best form specific therapies.

These integrated multi-omics data will for example provide more comprehensive insight into biological behavior and mechanisms that underlie growth patter, metastatic potential as well as response to (targeted) treatment of HNSCC.

Personalized medicine

From the perspective of turning our current understanding and available data into actionable insights that can be used to improve treatment outcome, personalized medicine is absolutely dependent on big data[16].

The amount of data available for the biomedical community exponentially increases, especially with advancing technologies generating terabytes of data, notably in sequencing and imaging. In terms of quantity, most data do not come from direct, patient related records available from daily clinical practice, but to a larger extent from computed automatic data analyses such as radiomics and digital image analysis. Head and neck cancers present a unique set of diagnostic and therapeutic challenges by nature of its complex anatomy and heterogeneity. Radiomics holds the potential to address these barriers[17].

Radiomics extracts and mines a large number of medical imaging features in a non-invasive and cost-effective way. The underlying assumption of radiomics is that these imaging features quantify phenotypic characteristics of an entire tumor.

Radiomics in precision oncology and cancer care allow for prognostic and reliable machine learning methods for the stratification (or personalization), i.e. identifying differences in (expected/predicted) survival between (groups of) patients, and prediction of treatment outcome(s) to support selection of the best possible treatment of head and neck cancer patients[18].

This might enable medical and radiation oncologist to (de-)escalate systemic treatment and irradiation doses in specific patient populations.


  • The value of big data capturing relies on the volume, velocity variety, veracity of various, often complex, data sets.
  • Integration of these sources is key and will be beneficial for improvements in biomedical research, patient care and monitoring quality of care.
  • In The Netherlands, where head and neck cancer care is centralized and various national big data resources are in place, there is an unique opportunity to unite, link and integrate these data and fulfill this unmet need.
  • Such a head and neck cancer infrastructure should optimize data input as well as (bioinformatical) data integration including FAIRification (FAIR — Findable, Accessible, Interoperable, Reusable).


[1] EXCERPTED FROM: Willems SM, Abeln S, Feenstra KA, de Bree R, van der Poel EF, Baatenburg de Jong RJ, Heringa J, van den Brekel MWM. The potential use of big data in oncology. Oral Oncol. 2019 Nov;98:8–12. doi: 10.1016/j.oraloncology.2019.09.003. Epub 2019 Sep 12. PMID: 31521885.

[2] Willems SM, Abeln S, Feenstra KA, de Bree R, van der Poel EF, Baatenburg de Jong RJ, Heringa J, van den Brekel MWM. The potential use of big data in oncology. Oral Oncol. 2019 Nov;98:8–12. doi: 10.1016/j.oraloncology.2019.09.003. Epub 2019 Sep 12. PMID: 31521885.

[3] Shaikh AR, Butte AJ, Schully SD, et al. Collaborative biomedicine in the age of big data: the case of cancer. J Med Internet Res 2014;16(4):e101. https://doi.org/10. 2196/jmir.2496. Apr 7.

[4] Roman-Belmonte JM, De la Corte-Rodriguez H, Rodriguez-Merchan EC. How blockchain technology can change medicine. Postgrad Med 2018;130(4):420–7.

[5] Bourne PE. What Big Data means to me. J Am Med Inform Assoc 2014;21(2):194. https://doi.org/10.1136/amiajnl-2014-002651.

[6] Willems SM, Abeln S, Feenstra KA, de Bree R, van der Poel EF, Baatenburg de Jong RJ, Heringa J, van den Brekel MWM. The potential use of big data in oncology. Oral Oncol. 2019 Nov;98:8–12. doi: 10.1016/j.oraloncology.2019.09.003. Epub 2019 Sep 12. PMID: 31521885.

[7] Willems SM, Abeln S, Feenstra KA, de Bree R, van der Poel EF, Baatenburg de Jong RJ, Heringa J, van den Brekel MWM. The potential use of big data in oncology. Oral Oncol. 2019 Nov;98:8–12. doi: 10.1016/j.oraloncology.2019.09.003. Epub 2019 Sep 12. PMID: 31521885.

[8] Bousfield D, McEntyre J, Velankar S, et al. Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources. F1000Research 2016;5(160). https://doi.org/10.12688/f1000research. 7911.1.

[9] Ooft ML, van Ipenburg J, Braunius WW, et al. A nation-wide epidemiological study on the risk of developing second malignancies in patients with different histological subtypes of nasopharyngeal carcinoma. Oral Oncol 2016;56:40–6.

[10] Datema FR, Ferrier MB, Vergouwe Y, et al. Update and external validation of a head and neck cancer prognostic model. Head Neck 2013;35(9):1232–7.

[11] Datema FR, Moya A, Krause P, et al. Novel head and neck cancer survival analysis approach: random survival forests versus Cox proportional hazards regression. Head Neck 2012;34(1):50–8.

[12] Barlesi F, Mazieres J, Merlio JP, et al. Routine molecular profiling of patients with advanced non-small-cell lung cancer: results of a 1-year nationwide programme of the French Cooperative Thoracic Intergroup (IFCT). Lancet 2016;387(10026):1415–26.

[13] Petersen JF, Timmermans AJ, van Dijk BAC. Trends in treatment, incidence and survival of hypopharynx cancer: a 20-year population-based study in the Netherlands. Eur Arch Otorhinolaryngol 2018;275(1):181–9.

[14] Timmermans AJ, van Dijk BA, Overbeek LI, et al. Trends in treatment and survival for advanced laryngeal cancer: A 20-year population-based study in The Netherlands. Head Neck 2016;38(Suppl 1):E1247–55.

[15] de Ridder M, Balm AJ, Smeele LE, et al. An epidemiological evaluation of salivary gland cancer in the Netherlands (1989–2010). Cancer Epidemiol 2015;39(1):14–20. Feb.

[16] Govers TM, Rovers MM, Brands MT, et al. Integrated prediction and decision models are valuable in informing personalized decision making. J Clin Epidemiol 2018. Aug 28 pii: S0895–4356(18)30447–5.

[17] Wong AJ, Kanwar A, Mohamed AS. Radiomics in head and neck cancer: from exploration to application. Transl Cancer Res 2016;5(4):371–82.

[18] Parmar C, Grossmann P, Rietveld D, et al. Radiomic machine-learning classifiers for prognostic biomarkers of head and neck cancer. Front Oncol 2015;3(5):272.

[19] Willems SM, Abeln S, Feenstra KA, de Bree R, van der Poel EF, Baatenburg de Jong RJ, Heringa J, van den Brekel MWM. The potential use of big data in oncology. Oral Oncol. 2019 Nov;98:8–12. doi: 10.1016/j.oraloncology.2019.09.003. Epub 2019 Sep 12. PMID: 31521885.

About the authors of the original paper (long version) & affiliations:

Stefan M. Willemsa,b,⁎; 
Sanne Abelnc; 
K. Anton Feenstrac ;
Remco de Breed ;
Egge F. van der Poele ;
Robert J. Baatenburg de Jonge ;
Jaap Heringac ;
Michiel W.M van den Brekelf

a Department of Pathology, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands

b Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands

c Department of Computer Science, Faculty of Science, Vrije Universiteit, Amsterdam, the Netherlands

d Department of Head and Neck Surgical Oncology, University Medical Center Utrecht, Utrecht, the Netherlands

e Department of Head and Neck Surgery, Erasmus Cancer Center, Erasmus MC, Rotterdam, the Netherlands

f Department of Head and Neck Oncology and Surgery, Netherlands Cancer Institute, Amsterdam, the Netherlands

Originally published at:

Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Related Posts