Artificial intelligence | Article

Ethics and AI: 2023 heralds “a Wild West era in generative AI”

Businesswoman discussing computer program with female colleague at desk in creative office

Monday 3rd of April 2023 - Updated on Wednesday 4th of October 2023

Reading time: 4 min

● A researcher working on the ethics of artificial intelligence for the Franco-American company Hugging Face, Giada Pistilli warns against the risks posed by generative AI models like ChatGPT and Stable Diffusion.
● She explains her concern that the quantity of data on which these tools are trained is being prioritised over data quality, which is leading to legal disputes over copyright, notably in the field of AI generated images.
● She also highlights the risk of algorithmic bias in content generation tools that have been trained on undisclosed data.

What is Hugging Face and can you tell us about your role as the company’s principal ethicist?

Giada Pistilli. We are a bit like GitHub for artificial intelligence, that is to say that we provide a platform for tools that developers can use to build, train and deploy machine-learning models based on open-source technologies. So our platform brings together a large community of data scientists and researchers. Within the company, my work is at the intersection of different aspects of artificial intelligence such as research on ethics and the application of ethics in AI, and AI regulation and public policy. With this in mind, I have to ask questions about the social impact of artificial intelligence and also to answer certain questions that have not been asked yet.

To what extent do open-source artificial intelligence models represent a danger?

Our role is to verify the integrity and inherent risks in certain models, because some language models can be used to create spam, conduct scams, and generate fake email, fake ratings and even fake content. For example, someone who wants to improve the moderation of posts on his or her website with a toxic language detection tool will need to build a model that is trained on toxic data (insults). The risk is that a tool of this kind could also be used for malicious purposes that subvert its intended goal, for example to create a bot that is specifically designed to generate malicious content. In cases of this kind, we ask developers working on the model to ensure that it is kept private.

It is important to know which data will be used to train models and the type of content they will generate

Is there a structured market for generative AI?

You might say that we have entered a Wild West era in generative AI. Huge models are being trained with vast amounts of data, and there is a risk that quantity will take precedence over quality. At Hugging Face, our scientific team is more focused on understanding how to find the right data, that is to say duly authorized quality data which can be legally used and shared. Earlier this year, Stable Diffusion, whose model enables users to generate digital images, was sued by Getty Images for copyright infringement. The outcome of this case will create a major legal precedent.

What questions should companies ask before building an artificial intelligence model?

It is important to know which data will be used to train models and the type of content they will generate. If they are to produce images, you need to ask questions about copyright and the consent of people in photos. With regard to Stable Diffusion, research has shown that data used to train the model included pornographic material. Therefore the ethical question that needs to be asked is, on the one hand, where does this data come from, to what extent can you make use of it and for what purpose? Even users with the best of intentions can generate soft-porn content when interacting with these tools, so their use by minors is very problematic. The same applies to tools that can create , which can decontextualize images of well-known people so as to deceive buyers, and create fake products etc. So it is vitally important to clearly state that images of this kind have been created by artificial intelligence and to seek to anticipate and mitigate these risks.

What about written content?

This question is often raised with regard to images, because it is easier to identify an artist’s style, but it is also an issue with language models: certain models may have been trained using books and newspaper articles that are protected by copyright. At the same time, it is also important that these models are trained with data that is diversified. If that is not the case, as we are dealing with statistics, this will create a bias and the AI will respond with the same arguments on a given theme, or only speak about one person, which risks discriminating against anyone else.

For me, the idea of plugging a language model into a search engine is highly problematic, because there is nothing more inaccurate, not least because data sources are not in any hierarchy, and it is not acceptable to have heterogeneous sources on the same level, for example, to have a scientific article ranked alongside a blog post by a processed food brand. The solution may be to design more closed system chatbots, because, to date, this is the only way to keep control of content.

Sources :

Artificial intelligence: a tool for domination or emancipation?

AI chatbots are coming to search engines — can you trust the results?

L’intelligence artificielle : outil de domination ou d’émancipation ?

AI chatbots are coming to search engines — can you trust the results? (en anglais)

Lexicon

deepfakes

A deepfake is a piece of video or audio content generated through the use of an artificial intelligence model, usually with the intent to deceive listeners or viewers.

Giada Pistilli

Giada Pistilli is principal ethicist at Hugging Face, a Franco-American company that develops tools to build applications using machine learning. She is also preparing a doctoral thesis in philosophy at Sorbonne University and the CNRS (French National Centre for Scientific Research) on the ethical issues raised by conversational artificial intelligence.

Ethics and AI: 2023 heralds “a Wild West era in generative AI”

What is Hugging Face and can you tell us about your role as the company’s principal ethicist?

To what extent do open-source artificial intelligence models represent a danger?

Is there a structured market for generative AI?

What questions should companies ask before building an artificial intelligence model?

What about written content?

Sources :

Read more :

Read also on Hello Future

Omnimodal AI

AI therapy: marketing hype and the hidden risks for users

A lexicon of artificial intelligence: understanding different AIs and their uses

AI Agents

Deepfakes: detection methods struggle to make limited progress

Generative AI: a growing threat to information systems

AI agents could further automate certain jobs

Devoxx France: “AI has ushered in a second revolution in the world of testing”

Ethics and AI: 2023 heralds “a Wild West era in generative AI”

What is Hugging Face and can you tell us about your role as the company’s principal ethicist?

To what extent do open-source artificial intelligence models represent a danger?

Is there a structured market for generative AI?

What questions should companies ask before building an artificial intelligence model?

What about written content?

Sources :

Read more :

Lexicon

We quote them

Read also on Hello Future

Omnimodal AI

AI therapy: marketing hype and the hidden risks for users

A lexicon of artificial intelligence: understanding different AIs and their uses

AI Agents

Deepfakes: detection methods struggle to make limited progress

Generative AI: a growing threat to information systems

AI agents could further automate certain jobs

Devoxx France: “AI has ushered in a second revolution in the world of testing”