Artificial data is a dangerous teacher


In April 2022, when the Dall-E, a Visio-Congist Text-to-Mill, it appeared to have attracted more than a million users in the first three months. The case was followed by ChatGpt in January 2023, which apparently reached 100 million monthly active users only two months after launch. Both significant moments in the development of artificial intelligence manufactured, which in turn have created the explosion of the content produced by AI on the web. The bad news is that, in 2024, this means that we will see fabricated, nonsense, abuse and lack of information and intensify the encrypted social stereotypes in these models of artificial intelligence.

The artificial intelligence revolution has not been stimulated by any recent theoretical progress – in fact, most of the basic work of artificial neural networks is for decades – but with “availability” of extensive datasets. Ideally, an artificial intelligence model of certain phenomena – whether human language, cognition or visual world – records as closely as possible.

For example, for a large language model (LLM) to produce human text, it is important to feed this model of a large volume of data that somehow expresses human language, interaction and communication. It is believed that the larger the dataset, the better the human affairs, in all the inherent beauties, the ugliness and even their oppression. We are in a period of obsession with the scale of models, data set and GPUs. For example, current LLMs have now entered into a period of trillion -parameter learning models, which means they need a billion -size dataset. Where can we find it? On the web

It is assumed that this web source data records the “earthly truth” for human communication and interaction, the proxy that can be used. Although various researchers have now shown that the online data set is often of poor quality, they tend to exacerbate negative stereotypes and contain problematic content such as racial slogans and hateful speech, often toward marginalized groups, this. It has not prevented the use of large artificial intelligence companies. Such data in the race to scalp.

With productive artificial intelligence, this problem gets worse. These models encrypt and enhance social stereotypes instead of representing the social world from input data in an objective way. In fact, recent work shows that manufacturing models encrypt and produce racist and discriminatory attitudes towards identity, cultures and languages ​​to the historical margins.

It is difficult to, if not impossible-even with advanced tools-be sure that text, image, audio and video data are currently and at the moment. Researchers at Stanford University, Hans Hanley and Zicker Doromric, claim 68 percent in the number of artificial articles posted to Reddit and a 131 % increase in inaccurate news articles between January 1, 2022 and 31 March 2023. Boomy, an online music manufacturer, claims So far it has produced 14.5 million songs (or 14 % of recorded music). In 2021, NVIDIA predicted that by 2030, there would be more artificial data than actual data in AI models. One thing is certainly: the web is destroyed by artificial produced data.

The worrying thing is that these extensive amounts of artificial intelligence outputs, in turn, are used as educational materials for future production AI models. As a result, in 2024, it will be a very important part of training materials for manufactured synthetic data manufactured from manufacturing models. Soon, we are trapped in a recursive loop where we will teach AI models using artificial data produced by AI models. Most of these will be contaminated with stereotypes that continue to strengthen historical and social inequalities. Unfortunately, this is also the data we will use to teach the production models applied in the above sectors, including medicine, therapeutic, education and law. We have not yet faced the catastrophic consequences of this issue. Until 2024, the explosion of artificial intelligence that we see is now attractive, instead of becoming a massive toxic waste that returns to us to bite us.

Leave a Reply

Your email address will not be published. Required fields are marked *