This piece is by WWSG exclusive thought leader, Sara Fischer.
New AI search products from OpenAI and other industry leaders are forcing news companies to rethink possible deals with AI firms that need news content to answer real-time queries about current events.
The big picture: Negotiations between the tech and news industries over AI have mostly focused on providing data for the broad training of large language models (LLMs) — but now, deal talks are shifting to address narrower use cases, where news publishers may have more leverage.
How it works: LLMs can be trained on vast troves of almost any kind of text — and firms have taken that from everything they can find on the internet. But to accurately answer queries about current events, LLMs need access to smaller pools of vetted information in real time.
The process by which LLMs provide answers based on specific data sets is called Retrieval Augmented Generation (RAG).
RAG helps make LLMs more accurate and reduces — but can’t fully eliminate — hallucinations, or made-up, incorrect answers.
Driving the news: The rollout of OpenAI’s SearchGPT and Microsoft’s Bing generative search product last week revealed new details about how partnerships between Big Tech firms and news publishers are evolving as LLM makers integrate more RAG-based approaches into their products.
OpenAI is currently testing SearchGPT with several news publishers, including The Atlantic and News Corp.
SearchGPT’s answers to user queries that feature news content have “clear, in-line, named attribution and links so users know where information is coming from and can quickly engage with even more results in a sidebar with source links,” OpenAI said.
Publishing partners can access tools to manage how their content appears in SearchGPT, it added.
Yes, but: It’s unclear whether new generative AI-powered search engines will provide publishers with as much revenue as traditional search, chiefly Google’s, did.
The old model — sending traffic to publishers via links from the search page — has worked pretty well for the past 20 years.
OpenAI is currently experimenting with a revenue-sharing model for creators that build within its GPT store, but a spokesperson confirmed that the deals with news publishers for RAG are built on licensing fees, not revenue share agreements.
Smaller startups, such as Tollbit, are trying to create marketplaces where revenue is shared between AI firms and news companies based on market demand. But those marketplaces require broad participation to work.
The big picture: AI firms have argued it’s legal for them to train their models on anything that’s “publicly available” online, but many publishers believe their content is protected under copyright law.
The New York Times’ lawsuit against OpenAI and Microsoft is expected to provide some clarity around the issue, but the case could take years to resolve.
In the interim, news firms and AI companies are trying to work through RAG deals without triggering new copyright fights.
Of note: OpenAI says news sites can surface in SearchGPT results even if they opt out of generative AI training.
What to watch: While OpenAI seems eager to strike deals with newsrooms, other AI companies are hesitant.
Anthropic wouldn’t confirm whether it has any publisher deals or intends to make them in the future.
Axios is not aware of any news companies that have cut a deal with Anthropic.
Meta is currently debating how to proceed.
Some executives, including CEO Mark Zuckerberg, are skeptical of media deals. Meta has a long record of first embracing media partnerships and then souring on them.
Others think Meta will have to reach agreements with news providers if it wants MetaAI, the company’s consumer-facing AI chatbot, to be accurate. Meta declined to comment.
Perplexity announced revenue-sharing deals Tuesday with Time, Der Spiegel, Fortune, Entrepreneur, The Texas Tribune and WordPress.com as part of a pilot program.
But many media firms continue to object to Perplexity’s use of their content. Forbes has threatened legal action, and Condé Nast sent the firm a cease and desist letter over data scraping.
Newt’s guest is David Trulio, President and CEO of the Ronald Reagan Presidential Foundation and Institute. They discuss the 35th anniversary of the fall of…
Tomorrow the House Ethic Committee is expected to discuss the fate of its report on Matt Gaetz, President-elect Trump’s choice for attorney general. The former Florida…