Thread by @ErikaVaris, #acl2020nlp #acl2020en Saunders & Byrne: Reducing Gender Bias in NMT as a [...]

#acl2020nlp #acl2020en Saunders & Byrne: Reducing Gender Bias in NMT as a Domain Adaptation Problem. Video: https://virtual.acl2020.org/paper_main.690.html,">https://virtual.acl2020.org/paper_mai... Paper: https://www.aclweb.org/anthology/2020.acl-main.690">https://www.aclweb.org/anthology...

Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem

Danielle Saunders, Bill Byrne. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.

https://virtual.acl2020.org/paper_main.690.html

Saunders: gender bias shows up even when there& #39;s an explicitly gendered pronoun in the source text. [i did not know this!] eg. & #39;The doctor did *her* work& #39; > his work in new lang

Saunders: building on WinoMT from Stanovsky et al 2019 on improving accuracy for correct entity gender, but also want to keep content accuracy, so BLEU.

Saunders: pretty large data bias for masc pronouns. Can try doing debiased embeddings, but practically difficult to do (collect all the data, computationally inefficient, etc)

Saunders: we treat this bias as a domain adaptation problem. Fast in neural models, and can get good results on small domains (which this is).

Saunders: baseline models are Transformers. We fine-tune on new data. 2 datasets: counterfactual data (gender-swapped), and handcrafted data.

Saunders: English source w/gendered terms + copy with swap. Forward translate gender swapped into target lang. --> synthetic approximate gender swapped dataset.

Saunders: after fine-tuning on this synthetic dataset, WinoMT accuracy improves (but not for Hebrew), and BLEU maintains acc. Acc is still not much above 50%, so you could get same results by guessing gender. [me: :-/ ]

Saunders: next, tiny handcrafted dataset for fine-tuning. 194 professions with Eng template sentence: "The [profession] finished [his/her] work." Easily translated into target langs, and results manually checked.

Saunders: with tiny dataset, fine-tuning adaptation is incredibly fast: few minutes on CPU.

Saunders: results w/handcrafted: catatrophic forgetting during adaptation. WinoMT acc improves but BLEU degrades.

Saunders: so, experiment with how we are adapting to this dataset. Elastic Weight Consolidation (EWC): during adaptation, regularize params & regularization is stronger if param was important to previous task (ie. general transl)

Saunders: results: a tradeoff between general acc and WinoMT. Want to avoid trade-off.

Saunders: consider a two-step process. Translate with one model (that may be biased), and then correct with another model.

Saunders: fine alternative possible inflections for each word, lattice rescoring to create new translation constrained by gender inflections.

Saunders: works quite well! outperforms EWC in most cases. good WinoMT scores, and less catastrophic translation forgetting.

Saunders: additional advantage: lattice approach doesn& #39;t need an NMT model or its translation data. You can use this lattice model to fix translations from commercial NMT models.

Neat! Data and walkthrough available at https://github.com/DCSaunders/gender-debias">https://github.com/DCSaunder...

DCSaunders/gender-debias

Adaptation datasets and scripts for the paper "Reducing gender bias in Neural Machine Translation as a domain adaptation problem" (ACL 2020) - DCSaunders/gender-debias

https://github.com/DCSaunders/gender-debias

Latest Threads Unrolled: