Open-Science NLP Bounty: ($100 + $100 to charity)
Task: A notebook demonstrating experiments within 30(!) PPL (<84) of this widely cited LM baseline on PTB / WikiText-2 using any non-pretrained, word-only Transformer variant.
Context: https://twitter.com/Tim_Dettmers/status/1245805495895511042">https://twitter.com/Tim_Dettm...
Task: A notebook demonstrating experiments within 30(!) PPL (<84) of this widely cited LM baseline on PTB / WikiText-2 using any non-pretrained, word-only Transformer variant.
Context: https://twitter.com/Tim_Dettmers/status/1245805495895511042">https://twitter.com/Tim_Dettm...
The state of benchmarking in NLP right now is so strange. These goofy websites keep precisely-curated leaderboards ( https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word),">https://paperswithcode.com/sota/lang... and hardworking grad students cannot get within 2x! these reported results.
Winning response. Props for author for responding.
I will accept other submissions if others are motivated to find a different solution.
https://twitter.com/ZihangDai/status/1245905407350112256
Very">https://twitter.com/ZihangDai... interesting explanation for why this is so difficult, and why it should arguably not be used in the future.
I will accept other submissions if others are motivated to find a different solution.
https://twitter.com/ZihangDai/status/1245905407350112256
Very">https://twitter.com/ZihangDai... interesting explanation for why this is so difficult, and why it should arguably not be used in the future.
@JesseDodge @ssgrn @nlpnoah Just out of curiosity What& #39;s the over-under on how many GPU hours this would have taken to replicate manually?
https://arxiv.org/abs/1909.03004 ">https://arxiv.org/abs/1909....
https://arxiv.org/abs/1909.03004 ">https://arxiv.org/abs/1909....