Open-Science NLP Bounty: ($100 + $100 to charity)

Task: A notebook demonstrating experiments within 30(!) PPL (<84) of this widely cited LM baseline on PTB / WikiText-2 using any non-pretrained, word-only Transformer variant.

Context: https://twitter.com/Tim_Dettmers/status/1245805495895511042">https://twitter.com/Tim_Dettm...
The state of benchmarking in NLP right now is so strange. These goofy websites keep precisely-curated leaderboards ( https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word),">https://paperswithcode.com/sota/lang... and hardworking grad students cannot get within 2x! these reported results.
Winning response. Props for author for responding.

I will accept other submissions if others are motivated to find a different solution.

https://twitter.com/ZihangDai/status/1245905407350112256

Very">https://twitter.com/ZihangDai... interesting explanation for why this is so difficult, and why it should arguably not be used in the future.
@JesseDodge @ssgrn @nlpnoah Just out of curiosity What& #39;s the over-under on how many GPU hours this would have taken to replicate manually?

https://arxiv.org/abs/1909.03004 ">https://arxiv.org/abs/1909....
You can follow @srush_nlp.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: