AI systems meant to spot abusive online content are far more likely to label tweets “offensive” if they were posted by people who identify as African-American.
The news: Researchers built two AI systems and tested them on a pair of data sets of more than 100,000 tweets that had been annotated by humans with labels like “offensive,” “none,” or “hate speech.” One of the algorithms incorrectly flagged 46% of inoffensive tweets by African-American authors as offensive. Tests on bigger data sets, including one composed of 5.4 million tweets, found that posts by African-American authors were 1.5 times more likely to be labeled as offensive. When the researchers then tested Google’s Perspective, an AI tool that the company lets anyone use to moderate online discussions, they found similar racial biases.
A hard balance to strike: Mass shootings perpetrated by white supremacists in the US and New Zealand have led to growing calls from politicians for social-media platforms to do more to weed out hate speech. These studies underline just how complicated a task that is. Whether language is offensive can depend on who’s saying it, and who’s hearing it. For example, a black person using the “N word” is very different from a white person using it. But AI systems do not, and currently cannot, understand that nuance.
The risk: By rushing to use software to automatically weed out offensive language, we risk silencing minority voices. Moderating online content is a traumatizing, difficult job, so tech companies are keen to rely on AI systems instead of human beings (they’re also much cheaper). This study shows the huge risks inherent in that approach.
Sign up here for our daily newsletter The Download to get your dose of the latest must-read news from the world of emerging tech.