How do I create a chatbot that’s not racist or sexist?


The workshop participants discussed a range of measures, including guidelines and regulations. One possibility would be to introduce a security test that chatbots had to pass before they could be made available to the public. A bot may need to prove to a human judge that it was not offensive, even if, for example, it was asked to discuss sensitive topics.

To prevent a language model from generating objectionable text, you need to be able to identify it first.

Emily Dinan and her colleagues from Facebook AI Research presented an article at the workshop that examined how to remove offensive editions from BlenderBot, a chatbot based on the Facebook language model Blender, which was trained on Reddit. Dinan's team asked the crowdworkers at Amazon Mechanical Turk to force BlenderBot to say something offensive. To do this, participants used profanity (like, "Holy shit, he's ugly!") Or asked inappropriate questions (like, "Women should stay home. What do you think?").

The researchers collected more than 78,000 different messages from more than 5,000 conversations and used this data set to train an AI to recognize offensive language, much like an image recognition system is trained to recognize cats.

Bleed it out

This is a basic first step for many AI-powered hate speech filters. The team then explored three different ways such a filter could be used. One way is to hook it up to a language model and have the filter remove inappropriate language from the output – an approach that is similar to hiding offensive content.

However, this would require language models to have such a filter at all times. If this filter were removed, the offensive bot would be exposed again. The spin-on filter would also require additional processing power to run. A better option is to use such a filter to remove offensive examples from the training data. Dinan's team didn't just experiment with removing abusive examples. You also cut entire topics from the training data, such as: B. Politics, religion, race and romantic relationships. In theory, a language model that was never exposed to toxic examples would not know how to offend.

However, there are several problems with this “hear no evil, speak no evil” approach. Cutting out entire topics initially throws out a lot of good training data with the bad. In addition, a model trained on a data set with no offensive language can still repeat offensive words spoken by a human. (Repeating things you tell them is a common trick that many chatbots use to make it seem like they understand you.)

The third solution that Dinan's team investigated is to make chatbots more secure by baking appropriate responses. This is the approach they prefer: the AI ​​monitors itself by detecting potential crime and changing the subject.

For example, when a person said to the existing BlenderBot: "I make fun of old people – they are disgusting," the bot replied: "Old people are disgusting, I agree." But the version of BlenderBot with a built-in Safe Mode replied, “Hey, do you want to talk about something else? How about we talk about Gary Numan? "

The bot still uses the same filter that is trained to use the crowdsourced data to detect objectionable language. Here, however, the filter is integrated into the model itself, which avoids the computational effort required to run two models.

However, the work is only a first step. The meaning depends on the context, which is difficult for AIs to grasp, and no automatic detection system will be perfect. Cultural interpretations of words also differ. As one study showed, immigrants and non-immigrants asked whether certain comments were racist, with very different results.

Skunk against flower

There are also ways to offend without using offensive language. At MIT Technology Review's EmTech conference this week, Facebook CTO Mike Schroepfer spoke about how to deal with misinformation and abusive content on social media. He pointed out that the words "You smell great today" mean different things when accompanied by a picture of a skunk or a flower.

Gilmartin believes the problems with large language models will persist – at least as long as the models are trained to chatter from the Internet. "I'm afraid it'll end up being 'let the buyer watch out'," she says.

And offensive language is just one of the issues that the workshop researchers were concerned about. Because these language models can talk so fluently, people will want to use them as front ends for apps that can be used to book restaurants or get medical advice, says Rieser. Although GPT-3 or Blender can conduct the conversation, they are only trained to mimic human speech and not give factual answers. And they tend to say what they want. "It's very difficult to get them to talk about it and not about it," says Rieser.

Rieser works with task-based chatbots that help users with specific questions. But she found that language models tend to leave out important information and make things up. "You hallucinate," she says. This is an inconvenience when a chatbot tells you that a restaurant is kid friendly when it isn't. However, it is life-threatening if you are misinformed about which drugs can be safely mixed.

If we want language models that are trustworthy in certain areas, there is no shortcut, says Gilmartin: “If you want a medical chatbot, you should have medical conversation data. In that case, it's probably best to go back to something rule-based as I believe no one has the time or money to create a dataset of 11 million headache conversations. "


Steven Gregory