#acl2020nlp #acl2020en Saunders & Byrne: Reducing Gender Bias in NMT as a Domain Adaptation Problem. Video: https://virtual.acl2020.org/paper_main.690.html, Paper: https://www.aclweb.org/anthology/2020.acl-main.690
Saunders: gender bias shows up even when there's an explicitly gendered pronoun in the source text. [i did not know this!] eg. 'The doctor did *her* work' > his work in new lang
Saunders: building on WinoMT from Stanovsky et al 2019 on improving accuracy for correct entity gender, but also want to keep content accuracy, so BLEU.
Saunders: pretty large data bias for masc pronouns. Can try doing debiased embeddings, but practically difficult to do (collect all the data, computationally inefficient, etc)
Saunders: we treat this bias as a domain adaptation problem. Fast in neural models, and can get good results on small domains (which this is).
Saunders: baseline models are Transformers. We fine-tune on new data. 2 datasets: counterfactual data (gender-swapped), and handcrafted data.
Saunders: English source w/gendered terms + copy with swap. Forward translate gender swapped into target lang. --> synthetic approximate gender swapped dataset.
Saunders: after fine-tuning on this synthetic dataset, WinoMT accuracy improves (but not for Hebrew), and BLEU maintains acc. Acc is still not much above 50%, so you could get same results by guessing gender. [me: :-/ ]
Saunders: next, tiny handcrafted dataset for fine-tuning. 194 professions with Eng template sentence: "The [profession] finished [his/her] work." Easily translated into target langs, and results manually checked.
Saunders: with tiny dataset, fine-tuning adaptation is incredibly fast: few minutes on CPU.
Saunders: results w/handcrafted: catatrophic forgetting during adaptation. WinoMT acc improves but BLEU degrades.
Saunders: so, experiment with how we are adapting to this dataset. Elastic Weight Consolidation (EWC): during adaptation, regularize params & regularization is stronger if param was important to previous task (ie. general transl)
Saunders: results: a tradeoff between general acc and WinoMT. Want to avoid trade-off.
Saunders: consider a two-step process. Translate with one model (that may be biased), and then correct with another model.
Saunders: fine alternative possible inflections for each word, lattice rescoring to create new translation constrained by gender inflections.
Saunders: works quite well! outperforms EWC in most cases. good WinoMT scores, and less catastrophic translation forgetting.
Saunders: additional advantage: lattice approach doesn't need an NMT model or its translation data. You can use this lattice model to fix translations from commercial NMT models.
Neat! Data and walkthrough available at https://github.com/DCSaunders/gender-debias