Multimodal AI for medicine, simplified — [enabled by Large language models (LLMs)]

byJoaquim Cardoso

17 de março de 2023

5 minute read

the health strategist

institute for continuous transformation
– in health and tech

Joaquim Cardoso MSc
Founder and Chief Researcher & Editor
March 17, 2023

EXECUTIVE SUMMARY

There is a potential for multimodal AI in medicine, which can ingest and process data from many sources, including multiple continuous biosensors, biologic layers, environment, and medical records.

Currently, AI in medicine is image-centric, with minimal integration with inputs from text and voice.

However, as the use of large language models (LLMs) and their parameters and tokens continue to grow, multimodal AI may become more attainable.

The potential of LLMs for transformative impact in medicine is seen in DeepMind’s AlphaFold, which accurately predicts 3-D protein structure from amino acid sequences.

In the months ahead, generative AI can help clinicians with many language-based tasks, including synthetic office notes based on voice, pre-authorization from insurance companies, aggregating patient histories, and more.

LLMs can provide opportunities for virtual health coaching, hospital-at-home, and a digital twin infrastructure.

While there are major issues — such as LLM hallucinations, pseudo-reasoning, amplification of concerns over bias, privacy and security of data, the dominance of the tech industry, and more that need to be addressed — LLMs could be embraced as the antidote to electronic health record disaster that has transformed clinicians into data clerks.

DEEP DIVE

Multimodal AI for medicine, simplified

The catalytic impact of large language models (Generative AI)

Eric Topol
Mar 14, 2023

AI in medicine is basically a single mode story to date. Help reading an X-ray or MRI, finding polyps during a colonoscopy, provide patient coaching for a specific condition like diabetes, or a preliminary diagnosis of a skin lesion or heart rhythm from a smartwatch recording.

It has largely been image-centric to date, with minimal integration with, or use of inputs from, text and voice. But over time that narrowness and constraint may well be alleviated.

My colleagues and I recently wrote a review of the potential for multimodal AI, when data from many sources can be ingested and processed as seen below.

No one has yet done this yet: pulling together and extracting the knowledge from individuals at scale-sources of data that include multiple continuous biosensors, biologic layers such as the genome and microbiome, environment, and medical records.

That ultimately will not only be attainable but will enable many opportunities such as the virtual health coach, hospital-at-home, and a digital twin infrastructure.

How do we get there?

There has been unparalleled interest in ChatGPT and large language models (LLMs).

It’s been referred to as an iPhone moment by many, with the most rapid user base ever seen, comparators to 100 million users below.

It’s been referred to as an iPhone moment by many, with the most rapid user base ever seen, comparators to 100 million users below.

I see it as major 4 building blocks that have brought us to here.

Deep learning AI was potentiated by a large number of graphic processing units (GPUS or TPUs), but generative AI is that on high dose steroids.
This has created seemingly insatiable ability to ingest data, in the form of tokens, with parameters, the term representing the number of connections between neurons, and the computing power metric known as floating point operations (FLOPS).
Rather than supervised learning which largely powered medical algorithms to date, requiring expert annotation of images and ground truths, the training with LLMs is disproportionately limited relative to the massive data inputs.
The new GPT-4 model details were released today . It’s multimodal (now bimodal with text and image integration) capability has increased: “outperforming existing LLMs on a collection of natural language processing tasks”

It’s the coming together of multimodal data this quite an advance for LLMs, as you can see the acceleration of each modality increasing separately on a log-scale below

It’s the coming together of multimodal data this quite an advance for LLMs, as you can see the acceleration of each modality increasing separately on a log-scale below

Is Bigger AI (More Parameters) Better?

Graph below from Anil Ananthaswamy’s recent and excellent Nature piece showing the accelerated evolution (log-scale) for each data domain (by parameters).

The answer to that question is clearly no.

While transformer models, as used with LLMs, have already surpassed 1 trillion parameters as seen by Xavier Amatriain’s graph below the number of tokens is exceptionally important, as seen in the X-axis below.

Perhaps the most successful LLM for transformative impact to date has been DeepMind’s AlphaFold which accurately predicts 3-D structure of proteins from amino acid sequences.

So the rumor that GPT-4 would have >100 trillion parameters is not only wrong, but markedly over-values one of the components of LLMs.

Saying it again: tokens are damn important. Bigger (by parameters) isn’t necessarily better.

From Chatting with Sydney to Multimodal Medical AI

While the lengthy conversation that Kevin Roose had with Sydney (Bing’s new search integrating ChatGPT+) will go down in history and even made it to the front page of the New York Times, remember that we’re still in the early days of LLMs and virtually none have had specific or extensive pre-training in medicine.

I recently wrote about how Google’s Med-PALM and ChatGPT did well on the US medical licensing examination.

Today we’ve just learned that the next iteration of Google’s LLM, Med-PaLM-2 scored an 85%, well above the previous report of 67% (60% is passing score threshold).

But that’s using a chatbot’s memory for a single mode language task that chiefly relies on memory and the statistically driven juxtaposition of words, aptly described as a lossy JPEG image by Ted Chiang in the New Yorker (a piece not to miss).

There’s a big jump forward from this to drive keyboard liberation for clinicians, which is just now beginning, using LLM training inputs from millions, or tens of millions, of medical records.

In the months ahead we’ll see the beginning of generative AI to take on so many language-based tasks:

synthetic office notes based on voice (with automated prescriptions, next appointments, billing codes, scheduling of labs and tests), pre-authorization from insurance companies, aggregating and summarizing a patient’s history from scouring their medical record(s), operation and procedure notes, discharge summaries, and more.

Examples from Doximity (docsGPT) and Abridge are showing us the way.

As opposed to the electronic health record disaster that has transformed clinicians to data clerks and led to profound disenchantment, over time LLMs may well be embraced as the antidote.

As opposed to the electronic health record disaster that has transformed clinicians to data clerks and led to profound disenchantment, over time LLMs may well be embraced as the antidote.

Obviously there are major issues that have to be grappled with, …

… that not only include LLM hallucinations (mandating humans-in-the-loop oversight), pseudo-reasoning, amplification of concerns over bias, privacy and security of data, t he dominance of the tech industry and more.

But, without LLMs, it would be hard to see how we could get multimodal AI forward progress.

It’s ultimately the ability to move seamlessly between medical images, text, voice, and all data sources (sensors, genome, microbiome, the medical literature) that will afford the many opportunities shown in the top diagram of this post.

It’s ultimately the ability to move seamlessly between medical images, text, voice, and all data sources (sensors, genome, microbiome, the medical literature) that will afford the many opportunities shown in the top diagram of this post.

For that, I’m excited to see the rapid evolution of LLMs and their future application for medicine and healthcare.

Originally published at https://erictopol.substack.com on March 14, 2023.

Names mentioned (selected list)

Anil Ananthaswamy’s

Author

Joaquim Cardoso

Deixe um comentário Cancelar resposta

GPT-4 will arrive next week and will be multimodal [also handling different languages]

the health strategist institute for continuous transformationin health and tech Joaquim Cardoso MScFounder and Chief Researcher & EditorMarch…

byJoaquim Cardoso

What is the extent of changes in LLM performance and the possible reasons behind them? — When the ‘steering’ of AI worth the squeezing?

health strategy review management, engineering and technology review Joaquim Cardoso MSc. Senior Research and Strategy Officer (CRSO), Chief Editor and Senior Advisor…

byJoaquim Cardoso

The race to bring generative AI to mobile devices

The Health Transformation institute for health strategy, digital healthand continuous transformation Joaquim Cardoso MScChief Research and Strategy Officer…

byJoaquim Cardoso

Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine

the health strategist . institute research and strategy institute — for continuous transformationin health, care, cost and tech Joaquim Cardoso MScChief…

byJoaquim Cardoso

BloombergGPT: A Large Language Model for Finance — [built with 363 billion token dataset – perhaps the largest domain-specific dataset yet]

the health strategist . institute research and strategy institute — for continuous transformationin health, care, cost and tech Joaquim Cardoso MScChief…

byJoaquim Cardoso

Generative AI was enabled by 3 main technologies: (1) massive computing power; (2) data; and (3) the invention of the transformer model

Health and Tech Institute strategy for continuous transformation Joaquim Cardoso MScChief Research and Strategy Officer (CRSO),Editor in Chief…

byJoaquim Cardoso

Generative AI: how will the new era of machine learning affect you? [accelerated by ChatGPT]

institute for health transformation (InHealth) Joaquim Cardoso MScFouder, Chief Editor and Senior AdvisorJanuary 25, 2023 EXECUTIVE SUMMARY In the…

byJoaquim Cardoso

Doctors, Get Ready for Your AI Assistants

institute forhealth transformation EXECUTIVE SUMMARY In 2023, AI will play a growing role in the medical field, assisting…

byJoaquim Cardoso

The Latest

GenerativeAI revisited by Goldman Sachs

Amazon Health Launches $49 Telehealth Service

Amil e Dasa Criam Segunda Maior Rede Hospitalar do Brasil: Fusão Estratégica e Preparação para IPO

Microsoft Discontinues Copilot GPT Builder, Sparks Concern Among Subscribers

Multimodal AI for medicine, simplified — [enabled by Large language models (LLMs)]

the health strategist

Joaquim Cardoso MSc
Founder and Chief Researcher & Editor
March 17, 2023

EXECUTIVE SUMMARY

There is a potential for multimodal AI in medicine, which can ingest and process data from many sources, including multiple continuous biosensors, biologic layers, environment, and medical records.

In the months ahead, generative AI can help clinicians with many language-based tasks, including synthetic office notes based on voice, pre-authorization from insurance companies, aggregating patient histories, and more.

DEEP DIVE

Multimodal AI for medicine, simplified

The catalytic impact of large language models (Generative AI)

How do we get there?

I see it as major 4 building blocks that have brought us to here.

Is Bigger AI (More Parameters) Better?

The answer to that question is clearly no.

Perhaps the most successful LLM for transformative impact to date has been DeepMind’s AlphaFold which accurately predicts 3-D structure of proteins from amino acid sequences.

From Chatting with Sydney to Multimodal Medical AI

I recently wrote about how Google’s Med-PALM and ChatGPT did well on the US medical licensing examination.

There’s a big jump forward from this to drive keyboard liberation for clinicians, which is just now beginning, using LLM training inputs from millions, or tens of millions, of medical records.

Obviously there are major issues that have to be grappled with, …

Names mentioned (selected list)

Deixe um comentário Cancelar resposta

Multimodal AI for medicine, simplified — [enabled by Large language models (LLMs)]

the health strategist

Joaquim Cardoso MScFounder and Chief Researcher & EditorMarch 17, 2023

EXECUTIVE SUMMARY

There is a potential for multimodal AI in medicine, which can ingest and process data from many sources, including multiple continuous biosensors, biologic layers, environment, and medical records.

In the months ahead, generative AI can help clinicians with many language-based tasks, including synthetic office notes based on voice, pre-authorization from insurance companies, aggregating patient histories, and more.

DEEP DIVE

Multimodal AI for medicine, simplified

The catalytic impact of large language models (Generative AI)

How do we get there?

I see it as major 4 building blocks that have brought us to here.

Is Bigger AI (More Parameters) Better?

The answer to that question is clearly no.

Perhaps the most successful LLM for transformative impact to date has been DeepMind’s AlphaFold which accurately predicts 3-D structure of proteins from amino acid sequences.

From Chatting with Sydney to Multimodal Medical AI

I recently wrote about how Google’s Med-PALM and ChatGPT did well on the US medical licensing examination.

There’s a big jump forward from this to drive keyboard liberation for clinicians, which is just now beginning, using LLM training inputs from millions, or tens of millions, of medical records.

Obviously there are major issues that have to be grappled with, …

Names mentioned (selected list)

Deixe um comentário Cancelar resposta

Related Posts

Joaquim Cardoso MSc
Founder and Chief Researcher & Editor
March 17, 2023