Baidu forbids Google and Bing from
using information scraping to train artificial intelligence
In an effort to stop Google and Microsoft Bing from scraping its content, Chinese internet search company Baidu upgraded its Wikipedia-like Baike service.
The most recent update to the Baidu Baike robots.txt file made this modification, which prevents Googlebot and Bingbot crawlers from accessing the file.
The Wayback Machine states that the alteration happened on August 8. The central repository of Baidu Baike, with about 30 million entries, was previously accessible to the search engines Google and Bing, despite some target subdomains on the website being blocked.
This action by Baidu comes amid increasing demand for large datasets used in training artificial intelligence models and applications. It follows similar moves by other companies to protect their online content. In July, Reddit blocked various search engines, except Google, from indexing its posts and discussions.
Google, like Reddit, has a financial agreement with Reddit for data access to train its AI services.
According to sources, in the past year, Microsoft considered restricting access to internet-search data for rival search engine operators; this was most relevant for those who used the data for chatbots and generative AI services.
Such a move is emerging against the background where developers of generative AI around the world are increasingly working with content publishers in a bid to access the highest-quality content for their projects. For instance, relatively recently, OpenAI signed an agreement with Time magazine to access the entire archive, dating back to the very first day of the magazine’s publication over a century ago. A similar partnership was inked with the Financial Times in April.
Online platforms are now managing material access differently, with many opting to restrict or charge for access to their data.
additional businesses will probably review their data-sharing practices as the AI sector develops, which could result in additional adjustments to the way that data is indexed and accessible online.
The most recent update to the Baidu Baike robots.txt file made this modification, which prevents Googlebot and Bingbot crawlers from accessing the file.
Discover more from Postbox Live
Subscribe to get the latest posts sent to your email.