New Natural Language Processing Technique Helps Detect Microaggressions CMU Researchers Develop Method for Surfacing Subtle Toxic Language

Virginia Alvino YoungTuesday, January 28, 2020

New work by CMU researchers, including the LTI's Yulia Tsvetkov, helps identify microaggressions in order to build future intervention techniques.

Existing processes for identifying toxic language primarily focus on overt hate speech and can automatically filter things like racist voices on social media. But no such tools exist for detecting microaggressions, which are much more subtle and can be just as harmful. New work by Carnegie Mellon University researchers helps identify these microaggressions in order to build intervention techniques in the future.

"You're pretty for a black girl" is an example of a microaggression that can easily be intuited as a negative sentence, according to Yulia Tsvetkov, an assistant professor in CMU's Language Technologies Institute. But she said natural language processing (NLP) tools that determine sentiment may classify it as positive because of the words it uses.

"Right now there's no way for NLP to detect these manifestations of social bias, which are often unconscious," Tsvetkov said. "While you can build a vocabulary of hate language, you can't do that for veiled negativity or discrimination."

Researchers convened a user group to help develop a methodology to surface these microaggressions, which can be difficult for people to objectively recognize because the phenomenon itself is subjective. Based on individuals' biases and who they are, understanding of microaggressions change. Instead, researchers asked the group looking at sample sentences, "Is this offensive?" The researchers did not try to model the degree of offensiveness, but instead noted the level of disagreement between people. If a sentence was overtly positive or negative, users were prone to agree. But they found the higher the level of discrepancy, the more likely the sentence was to contain a microaggression.

"We leveraged human biases to annotate for biases," Tsvetkov said.

The researchers built a machine learning model to automatically predict disagreement between people. It also leveraged prior work that collected weak signals of bias, like specific ways verbs are used or mentions of certain social groups or topics that are more likely to contain bias.

The resulting model does not detect microaggressions, but helps to bring possible microaggressions to the surface in a sea of social media content. A random sampling of social media posts included 3% microaggressions. The sample the model pulled included 10%. The most frequently encountered types of microaggressions identified on social media assigned stereotypical properties to social groups, such as "men are doctors and women are nurses."

Tsvetkov said that while this first phase of detection helps increase the likelihood of finding microaggressions in a big pool of data, techniques still must be developed to classify sentences as microaggressions or not.

"Eventually I hope this tool can be used to monitor civility on online forums," Tsvetkov said. "A future web plug-in could alert people in a friendly way, 'Do you want to rephrase that?'"

Other researchers who contributed to this work include CMU's Luke Breitfeller and Emily Ahn, and the University of Michigan's David Jurgens.

For More Information

Byron Spice | 412-268-9068 | bspice@cs.cmu.edu<br>Virginia Alvino Young | 412-268-8356 | vay@cmu.edu