As pretrained language models grow more common in #NLProc, it is crucial to evaluate their societal biases. We launch a new task, evaluation metrics, and a large dataset to measure stereotypical biases in LMs:
Paper: https://arxiv.org/abs/2004.09456 
Site: http://stereoset.mit.edu/ 
Thread👇
[2/] The task and metrics are based on an ideal language model (LM). An ideal LM should perform well at language modeling, but not have a preference for stereotypes or anti-stereotypes. We create the Context Association Test (CAT), which measures LM ability and stereotype ability
[3/] We measure LM ability based on the model's preferences for meaningful contexts over meaningless contexts. The LM score of an ideal model is 100. Similarly, we measure stereotype bias based on how often the model prefers stereotypical contexts vs anti-stereotypical contexts.
[4/] The stereotypical bias score of an idealistic model is 50. The combination of these two scores gives the Idealized CAT score, which measures the unbiased LM ability.
[5/] We choose to measure bias in four domains: gender, profession, race, and religion, and collect 16,995 sentences that characterize the human stereotypical biases for these domains.
[6/] We find that as a model size (# parameters) increases, so does it’s LM ability and stereotypical behavior! However, we find that this isn’t necessarily correlated with idealistic LM ability.
[7/] We find that GPT2 is relatively more idealistic than BERT, XLNet and RoBERTa. We conjecture this is due to nature of pretraining data (Reddit data is likely to see more stereotypes and anti-stereotypes. c.f. Section 8). However, GPT is still 27 ICAT points behind an ideal LM
[8/] We also study an ensemble of BERT-large, GPT2-large, and GPT2-medium, and conjecture that the most biased terms are the ones that have well-established stereotypes in society (but with some surprising exceptions).
You can follow @moinnadeem.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: