Saunders: gender bias shows up even when there's an explicitly gendered pronoun in the source text. [i did not know this!] eg. 'The doctor did *her* work' > his work in new lang
Saunders: building on WinoMT from Stanovsky et al 2019 on improving accuracy for correct entity gender, but also want to keep content accuracy, so BLEU.
Saunders: pretty large data bias for masc pronouns. Can try doing debiased embeddings, but practically difficult to do (collect all the data, computationally inefficient, etc)
Saunders: we treat this bias as a domain adaptation problem. Fast in neural models, and can get good results on small domains (which this is).
Saunders: baseline models are Transformers. We fine-tune on new data. 2 datasets: counterfactual data (gender-swapped), and handcrafted data.
Saunders: English source w/gendered terms + copy with swap. Forward translate gender swapped into target lang. --> synthetic approximate gender swapped dataset.
Saunders: after fine-tuning on this synthetic dataset, WinoMT accuracy improves (but not for Hebrew), and BLEU maintains acc. Acc is still not much above 50%, so you could get same results by guessing gender. [me: :-/ ]
Saunders: next, tiny handcrafted dataset for fine-tuning. 194 professions with Eng template sentence: "The [profession] finished [his/her] work." Easily translated into target langs, and results manually checked.
Saunders: with tiny dataset, fine-tuning adaptation is incredibly fast: few minutes on CPU.
Saunders: results w/handcrafted: catatrophic forgetting during adaptation. WinoMT acc improves but BLEU degrades.
Saunders: so, experiment with how we are adapting to this dataset. Elastic Weight Consolidation (EWC): during adaptation, regularize params & regularization is stronger if param was important to previous task (ie. general transl)
Saunders: results: a tradeoff between general acc and WinoMT. Want to avoid trade-off.
Saunders: consider a two-step process. Translate with one model (that may be biased), and then correct with another model.
Saunders: fine alternative possible inflections for each word, lattice rescoring to create new translation constrained by gender inflections.
Saunders: works quite well! outperforms EWC in most cases. good WinoMT scores, and less catastrophic translation forgetting.
Saunders: additional advantage: lattice approach doesn't need an NMT model or its translation data. You can use this lattice model to fix translations from commercial NMT models.
You can follow @ErikaVaris.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: