Contact Us

Major websites are blocking AI crawlers from accessing their content

AI Stock image
Thought Leader: Sara Fischer
August 31, 2023
Source: AXIOS

Nearly 20% of the top 1000 websites in the world are blocking crawler bots that gather web data for AI services, according to new data from Originality.AI, an AI content detector.

Why it matters: In the absence of clear legal or regulatory rules governing AI’s use of copyrighted material, websites big and small are taking matters into their own hands.

Driving the news: OpenAI introduced its GPTBot crawler early in August, declaring that the data gathered “may potentially be used to improve future models,” promising that paywalled content would be excluded and instructing websites in how to bar the crawler.

By the numbers: Of the 1000 most visited websites in the world, the number of sites blocking OpenAI’s ChatGPT bot has increased from 9.1% on Aug 22 to 12% on Aug 29, per Originality.AI’s data.

How it works: Any page you can access from a web browser can also be “scraped” by a crawler — which operates just like a browser but stores the material in a database instead of displaying it to a user.

The big picture: Google and other web firms see their data crawlers’ work as fair use, but many publishers and intellectual property holders have long objected, and the company has faced multiple lawsuits over the practice.

Reality check: Some publishers saw at least some value in letting search crawlers access their sites since Google and other search sites sent users to their ad-supported sites.

Our thought bubble: Media outfits that feel they got taken by Google over the past two decades are eyeing the rapid commercialization of AI services like OpenAI with hostility and a “we won’t get fooled again” attitude.

Zoom in: News companies specifically are struggling to find the right balance between embracing AI and resisting it.

What to watch: If too much of the web blocks AI crawlers, their owners could find it harder to refine and update their AI products — and good data is getting tougher to find.

Subscribe to the WWSG newsletter.

Check Availability

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Speaker List
Share My List