No one wants Apple using screenshots
of their webpages to train artificial intelligence.
According to Stop Sign Wired, a number of well-known websites, including well-known news organizations and social media networks, are allegedly blocking Apple’s web crawler from scraping their pages for AI training materials.
As said, among the media companies that have altered their robots.txt files to stop Applebot from accessing them are Gannett, Vox Media, Condé Nast, The New York Times, The Atlantic, and The Financial Times. In terms of social media, Facebook, Instagram, Tumblr, Craiglist, and the venerable online community have all stated that Apple is not permitted to scrape their information.
Robots.txt files are turning into a more and more interesting resource for researching AI‘s digital politics. While The New York Times has clearly drawn a line in the sand over AI and is actively battling OpenAI for copyright infringement, several of these organizations, like Vox, Condé Nast, and The Atlantic, have signed content licensing agreements with OpenAI.
Meta, a rival of Apple in the AI space, owns both Facebook and Instagram, and sites like Tumblr and Craigslist that rely on user content have access to extremely profitable pools of high-quality data. Concurrently, Apple has already signed an agreement with OpenAI to incorporate ChatGPT, the chatbot, into “Apple experiences.”
To put it succinctly, there is fierce competition in the AI space, especially when it comes to getting access to excellent, human-made training materials. Furthermore, the evolving relationship between AI companies and data sources such as social media platforms or journalism organizations presents an intriguing window into the decision-making process surrounding the use of AI, both for AI companies and publishers. This includes determining the permissible boundaries for bots such as Apple’s.
Coal Mines
As per a blog post by Apple, web publishers can explicitly choose to “opt out of their website content being used to train Apple’s foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools,” which is why Wired claims that these websites have blocked “Apple-Extended.”
Apple confirmed to Wired that blocking Applebot-Extended prevents any scraped data from being used to train the company’s AI models, not the original Applebot from crawling a website.
However, Applebot employs data scraping for Apple products such as Spotlight and Siri; this difference shows Apple is exercising caution when it comes to protecting intellectual property and copyright in the era of artificial intelligence.
The NYT isn’t the first business or organization suing AI manufacturers, and given that it has previously used OpenAI to cover some of its product gaps, it could be prudent for Apple to refrain from removing any contentious or lawsuit-pending data. Consider it the coal mine’s billion-dollar canary.
Discover more from Postbox Live
Subscribe to get the latest posts sent to your email.