My page - topic 1, topic 2, topic 3 Postbox Live

Baidu forbids Google and Bing

Baidu Forbids Google And Bing From Using Information Scraping To Train Artificial Intelligence

Baidu forbids Google and Bing from

using information scraping to train artificial intelligence

In an effort to stop Google and Microsoft Bing from scraping its content, Chinese internet search company Baidu upgraded its Wikipedia-like Baike service.
The most recent update to the Baidu Baike robots.txt file made this modification, which prevents Googlebot and Bingbot crawlers from accessing the file.
The Wayback Machine states that the alteration happened on August 8. The central repository of Baidu Baike, with about 30 million entries, was previously accessible to the search engines Google and Bing, despite some target subdomains on the website being blocked.

This action by Baidu comes amid increasing demand for large datasets used in training artificial intelligence models and applications. It follows similar moves by other companies to protect their online content. In July, Reddit blocked various search engines, except Google, from indexing its posts and discussions. Google, like Reddit, has a financial agreement with Reddit for data access to train its AI services.
According to sources, in the past year, Microsoft considered restricting access to internet-search data for rival search engine operators; this was most relevant for those who used the data for chatbots and generative AI services.

Meanwhile, the Chinese Wikipedia, with its 1.43 million entries, remains available to search engine crawlers. A survey conducted by the South China Morning Post found that entries from Baidu Baike still appear on both Bing and Google searches. Perhaps the search engines continue to use older cached content.
Such a move is emerging against the background where developers of generative AI around the world are increasingly working with content publishers in a bid to access the highest-quality content for their projects. For instance, relatively recently, OpenAI signed an agreement with Time magazine to access the entire archive, dating back to the very first day of the magazine’s publication over a century ago.
A similar partnership was inked with the Financial Times in April.

The growing significance of data in the AI era is demonstrated by Baidu’s move to limit major search engines’ access to its Baidu Baike content. Large, carefully selected datasets have become much more valuable as businesses make considerable investments in AI development. Online platforms are now managing material access differently, with many opting to restrict or charge for access to their data.
additional businesses will probably review their data-sharing practices as the AI sector develops, which could result in additional adjustments to the way that data is indexed and accessible online.

 

 

 


Discover more from Postbox Live

Subscribe to get the latest posts sent to your email.

Leave a Reply

error: Content is protected !!

Discover more from Postbox Live

Subscribe now to keep reading and get access to the full archive.

Continue reading