It Seems Like AI Is Slowly Killing Itself
“The web is becoming increasingly a dangerous place to look for your data.”
The internet is overflowing with text and images created by AI, which may pose a serious challenge to generative AI models.
A growing body of research demonstrates that training generative AI models on AI-generated content erodes the models, as reported by Aatish Bhatia in The New York Times. To put it briefly, training on AI content results in a flattening cycle akin to inbreeding; last year, the phenomena was named “Habsburg AI,” a reference to the renownedly inbred European royal family.
This was discovered by AI researcher Jathan Sadowski.
It may also be more challenging to prevent this flattening effect due to the increasing amount of AI content on the web, according to the NYT.
Because AI models are so data-hungry, AI companies have had to rely on enormous amounts of data that are scraped from the internet to feed the insatiable programs. It is currently not mandatory for AI businesses or their users to include watermarks or disclosures on the AI material they create, which makes it even more difficult for AI creators to exclude synthetic content from AI training sets.
Sina Alemohammad, a doctoral student at Rice University, told the New York Times that “the web is becoming increasingly a dangerous place to look for your data.” Alemohammad is one of the authors of a 2023 paper that created the acronym “MAD,” or “Model Autophagy Disorder,” to explain the repercussions of AI self-consumption.
It’s been interesting to watch the topic gain prominence since we spoke with Alemohammad last year, when there wasn’t much focus on AI-generated data contaminating AI datasets.
The New York Times highlighted a somewhat humorous illustration of the effects of AI inbreeding, which came from a recent study that was published in the journal Nature last month. The first phrase that the researchers, a global cohort of scientists with bases in the UK and Canada, asked AI models to complete was, “To cook a turkey for Thanksgiving, you…”
The initial output was typical. However, after only four iterations, the model began speaking incoherently, saying things like, “To cook a turkey for Thanksgiving, you need to know what you are going to do with your life if you don’t know what you are going to do with your life…”
However, there are other unfavorable consequences of AI cannibalism besides gibberish. Although the researchers began with a diverse set of AI-generated faces, the MAD study, which concentrated on image models, demonstrated that feeding AI outputs of fake human headshots quickly caused a bizarre convergence of facial features. Is there a reason, perhaps, why by the fourth generation cycle, that number is magical in AI? Almost everyone had the same appearance. Considering how problematic algorithmic bias currently is, there’s a significant chance that unintentionally consuming excessive amounts of AI content could result in less diversity in outcomes.
Acquiring large amounts of high-quality human-generated data has been essential to the latest developments in generative AI technology. But, AI enterprises may eventually run into a precarious wall as artificial intelligence (AI)-generated material muddies the digital seas and leaves no trustworthy means of telling real from phony.
Discover more from
Subscribe to get the latest posts sent to your email.