TL;DR: Training on only error-free Standard English corpora predisposes pretrained neural networks to discriminate against speakers from minority linguistic communities (e.g., African American Vernacular English, Colloquial Singapore English, etc.). We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples that expose these biases in popular NLP models like BERT and Transformer, and show that adversarially fine-tuning them for a single epoch significantly improves robustness without sacrificing performance on clean data.
In recent years, Natural Language Processing (NLP) systems have gotten increasingly better at modeling complex patterns in language by pretraining large language models and fine-tuning them on task-specific data to achieve state of the art results has become a norm. However, these English NLP models are often trained on error-free, Standard (often U.S.) English even though much of the global English speaking population speaks it as a second language or speaks a “non-standard” dialect like African American Vernacular English. This predisposes the pretrained models to discriminate against minority linguistic groups, also known as linguistic discrimination. In this work, we examine the potential allocative harms of linguistic discrimination by English NLP models through the use of adversarial examples.
At best, minority users will be frustrated and give up trying to interact with these systems. At worst, marginalized populations could be denied access to essential services if these systems are used as interfaces to government portals, or even wrongfully arrested (even potentially killed in the process), as in the case of a Palestinian man whose “good morning” Facebook post was automatically mistranslated as “attack them”.
In English, inflections are generally suffixes that indicate the tense or quantity of a noun, verb, or adjective. Inflectional variation is a phenomenon where “non-standard” inflections (e.g., I gone there vs. I went there) are produced and also a common feature of English dialects like African American Vernacular English and Singlish. Therefore, it is fair to expect an NLP system that does not discriminate between linguistic communities to be unaffected by inflectional variation.
To generate adversarial examples that expose the linguistic biases of current state of the art models, we proposed Morpheus, which works by finding the set of inflections in the sentence that maximally hurts the target model’s performance. One interpretation of adversarial examples is that they represent worst case scenarios of inflectional variation for a particular model.
Since the word order and base forms remain unchanged, the resulting adversarial examples are natural and preserve the semantics of the original examples (Fig. 2). In addition, Morpheus works in a black box fashion, i.e., it does not require access to the target model’s gradients to work, which is more realistic.
Running Morpheus on popular models for classification (question answering) and generation (machine translation) revealed that they were all affected by inflectional variation to different extents (Fig. 3). In particular, we would like to draw attention to the difference in robustness between the models trained on only questions with guaranteed answers (SQuAD 1.1) and the same architectures trained on questions without guaranteed answers (SQuAD 2.0). The latter setting includes unanswerable questions in addition to the answerable questions from SQuAD 1.1. Models trained for this task are expected to not only predict the answer span for the answerable questions, but also whether the question itself is answerable. Hence, SQuAD 2.0 represents the more realistic scenario compared to SQuAD 1.1.
This makes it especially concerning that the SQuAD 2.0-trained models are much less robust to inflectional variation than the SQuAD 1.1-trained models because it indicates that they are more likely to discriminate against minority linguistic communities. In the question answering setting, this can manifest as giving them the wrong answers or predicting that their questions are unanswerable, resulting in frustrating “Sorry, I didn’t get the question.” situations.
We found that widely used machine translation models like the Transformer were so brittle that even randomly perturbing the inflections had a large impact on their performance. Perturbing the inflections in a targeted manner (adversarial examples) resulted in an even greater drop (>50%) in performance on the WMT’14 test set.
Although training on Standard English datasets predisposes neural models to perpetuate linguistic discrimination, obtaining sufficient labelled non-standard English data for different NLP tasks is an expensive process and no easy feat. To mitigate the negative effects of training on only Standard English datasets, we proposed adversarial fine-tuning, in which the already trained model is fine-tuned further with a suitable adversarial training set. This is in contrast to standard adversarial training, in which the model is retrained on the adversarial examples from scratch and is a computationally expensive process.
In addition, we randomly perturb the inflections in the training examples instead of running Morpheus on the training set, which is computationally expensive. The adversarial distribution, rather than a uniform one, is also used to weight the random sampling. This ensures that inflections that tend to cause the model to fail will appear more frequently in the dataset used for adversarial fine-tuning. The adversarial distribution can also be thought of as a sort of adversarial "language model" and can be easily swapped out for a dialectal one when customizing the task model for use by a specific dialect community.
Our adversarial fine-tuning method proved to be effective at improving the models’ robustness to inflectional variation while also preserving performance on the Standard English examples (Fig. 4). This is an improvement over standard adversarial training methods that tend to affect the models’ performance on clean examples.
Ensuring that NLP technologies are inclusive and do not cause allocative harms is especially important since they are becoming increasingly ubiquitous in the products we use. Our work exposes the predisposition of existing neural models to amplify linguistic discrimination, a form of bias present in only NLP. Minority linguistic communities are also often ethnic minorities, which means that linguistic discrimination is often a form of racism and should be treated with the same urgency. Since it is naturally a challenge to obtain data that has complete dialectal coverage, we believe that the solution lies in building architectures that are robust to sociolinguistic variation.
It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations.
Samson Tan, Shafiq Joty, Min-Yen Kan, and Richard Socher. ACL 2020.