the health strategist
multidisciplinary institute
Joaquim Cardoso MSc
Chief Research & Strategy Officer (CRSO),
Chief Editor and Senior Advisor
August 23, 2023
Central message:
There is an intense and rapidly escalating competition among AI companies to secure data sources as they strive to develop advanced AI models.
- The surge in demand for larger AI models, coupled with the scarcity of specialized AI chips, has driven AI firms to aggressively seek out high-quality data for training.
- This data scramble has led to a race for data dominance, with companies striking deals, addressing legal challenges, enhancing data quality, and exploring new data management solutions.
As the article suggests, the quest for data has become a defining factor in the evolving AI landscape, shaping the strategies and future of AI companies.
One Page Summary:
In the rapidly evolving landscape of artificial intelligence (AI), a fierce race for data dominance is underway. The Economist’s article dated August 13, 2023, highlights the complex interplay between AI firms, data sources, and the pursuit of innovative AI models.
Previously, concerns were raised about the potential obsolescence of Adobe, known for its creative software, due to emerging AI tools like DALL-E 2 and Midjourney that create images from text.
However, Adobe’s response was unexpected, as it leveraged its vast database of stock photos to develop its own AI suite, Firefly.
By sidestepping internet image mining and the associated copyright disputes, Adobe’s Firefly has produced over 1 billion images since its March 2023 launch, leading to a 36% surge in the company’s share price.
This success story illuminates a broader theme within the AI landscape — the intense competition to acquire and utilize data.
In the AI realm, larger “generative” models rely on substantial data quantities, creating a demand for new data sources beyond internet scraping.
At the same time, companies possessing extensive data holdings are strategizing on how to monetize their assets. The result is a data-centric race in progress.
… companies possessing extensive data holdings are strategizing on how to monetize their assets. The result is a data-centric race in progress.
AI model development hinges on two critical components: datasets for training and processing power to identify relationships within those datasets.
While both components contribute to model enhancement, the scarcity of specialized AI chips has prompted a heightened focus on procuring more data.
The demand for data is outpacing its supply, with high-quality text for training projected to be depleted by 2026.
Modern AI models, like those from Google and Meta, are trained on massive datasets containing over 1 trillion words, dwarfing the content available on platforms like Wikipedia.
However, the quality of data is equally crucial.
Well-structured, factual, and specialized datasets yield higher-quality AI models.
Companies such as Microsoft, through its acquisition of GitHub, have utilized specialized information sets to refine AI tools for specific applications.
The data scramble is accompanied by legal challenges as content creators seek compensation
Source: This is a One Page Summary of the article “AI is setting off a great scramble for data”, published by The Economist
https://www.economist.com/business/2023/08/13/ai-is-setting-off-a-great-scramble-for-data