A Meeting of the Minds at Bellagio Spurs a Convening and a Movement
A Bellagio residency leads to a Bellagio convening and a project to alter the U.S. economic system in favor of better health Two Bellagio Center…
Thought Leader: Dave Chokshi
Large language models (LLMs) are poised to become a much bigger part of doctors’ clinical workflows, according to Scott Gottlieb, who served as commissioner of the FDA during the Trump administration.
He shared this take on Tuesday at the 3rd Annual Summit on the Future of Rural Health Care in Sioux Falls, South Dakota. He was being interviewed on stage by Tommy Ibrahim, president and CEO of Sanford Health Plan.
Ibrahim highlighted research that Gottlieb recently conducted with the American Enterprise Institute, a center-right/right-wing think tank. The study, which was released this summer, put five LLMs to the test — Open AI’s ChatGPT-4o, Google’s Gemini Advanced, Anthropic’s Claude 3.5, xAI’s Grok and Llama’s HuggingChat.
The research team asked these LLMs 50 questions from the most challenging installment of the three-part U.S. Medical Licensing Examination. The AI models did quite well.
Open AI’s ChatGPT-4o had the best performance with an accuracy rate of 98%. Llama’s HuggingChat had the worst accuracy rate at 66%, and the rest of the LLMs had an accuracy rate in the 84-90% range.
The U.S. Medical Licensing Examination requires candidates to answer about 60% of questions correctly. The average passing score for the exam has historically hovered around 75%.
Based on these study results, as well as the level of AI innovation Gottlieb is seeing out there in his role as partner at New Enterprise Associates, he is optimistic about the role that LLMs can play in the future of healthcare. But he doesn’t think this potential is being realized yet.
“I think we’re at the point right now that if you’re handling a complex case and you’re not using [LLMs], you probably should be. I think most physicians probably aren’t, because there’s not a good option within a health system setting where you can do it in a HIPAA-compliant fashion. There’s not a lot of systems that have deployed local instances of these chatbots,” Gottlieb explained.
He also mentioned research that he is currently conducting to further test LLMs’ medical capabilities. Gottlieb and his research team are currently feeding ChatGPT-4o clinical vignettes from the New England Journal of Medicine. Every issue, the journal includes a vignette of a difficult-to-pin down clinical case and gives the reader a multiple choice-style selection of what the case might be — answers are revealed in the next issue.
There are 350 examples of the journal’s clinical vignettes online, and Gottlieb and his team are feeding them all to ChatGPT-4o.
“So far, it’s getting 100% — and it explains how it arrived at the diagnosis. It takes things from the clinical vignette and explains why those clues were the key clues in helping to arrive at this diagnosis. The clinical reasoning is really profound,” he declared.
Gottlieb asked the audience to imagine a medical resident receiving a call for a complex case late at night. To him, it’s obvious that the resident should be able to use an LLM to help them more quickly reach a differential diagnosis.
“I mean, you almost have to be doing it,” Gottlieb remarked.
LLMs for clinical decision support haven’t been deployed at scale yet, though, he noted.
These tools aren’t easily accessible for most doctors. To use LLMs for diagnostic support, health systems must either create their own models or modify existing ones by layering on local health data and adding patient data privacy controls — and that takes time and resources, Gottlieb explained.
“But I think very soon everyone is going to have to think about how to deploy this point of care,” he said.
A Meeting of the Minds at Bellagio Spurs a Convening and a Movement
A Bellagio residency leads to a Bellagio convening and a project to alter the U.S. economic system in favor of better health Two Bellagio Center…
Thought Leader: Dave Chokshi
Scott Gottlieb: Trump’s former FDA commissioner warns RFK Jr. could ‘cost lives’ if confirmed
Scott Gottlieb, who led the Food and Drug Administration during the Trump administration, on Friday warned that Robert F. Kennedy Jr. could “cost lives” if…
Thought Leader: Scott Gottlieb
Sanjay Gupta: Can Science and God Coexist?
Faith and science may often seem at odds with one another, but renowned geneticist and former NIH director, Dr. Francis Collins, says that he sees…
Thought Leader: Sanjay Gupta