I was writing an article when I went a little deeper into the claim in the title. Here's the excerpt: 

A headline hit last year that 57% of text on the internet is AI generated: true, but not that simple. I read the research paper so you don’t have to. The real story is a different kind of alarming. 

Even before the 2000’s, AI tools have been able to translate written text into a plethora of languages. The technique is called multi-way parallel translations (think of the ancient Rosetta stone, onto which the same message was written in several languages at the same time.) “Resource rich language”, the languages which have had the most energy dedicated to parsing (English, Spanish, Mandarin) have the highest quality translations, but “lower resource languages”, to which little energy has been spent, have poorer quality translations. As it turns out, 57% of internet content is AI generated multi-way parallel translations, often poor quality.. Out of this content, 40% was categorised by the study as “Conversation and Opinion”, a bunch of noise that largely exists for ad revenue. The problem is that AI is being trained on this low quality, poorly translated data. There is a possibility for a negative feedback loop of generative AI models creating websites which then get “scraped” by new models, which train on this data, making the internet more and more full of worse and worse writing.

Read the original research paper: 

 https://web.archive.org/web/20240607063435/https://arxiv.org/abs/2401.05749

2

0
0

Reactions

0
0
Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities