Thread by @srush_nlp, Open-Science NLP Bounty: ($100 + $100 to charity)Task: A notebook demonstrating experiments [...]

Sasha Rush

srush_nlp

Open-Science NLP Bounty: ($100 + $100 to charity)

Task: A notebook demonstrating experiments within 30(!) PPL (<84) of this widely cited LM baseline on PTB / WikiText-2 using any non-pretrained, word-only Transformer variant.

Context: https://twitter.com/Tim_Dettmers/status/1245805495895511042">https://twitter.com/Tim_Dettm...

https://twitter.com/Tim_Dettmers/status/1245805495895511042

The state of benchmarking in NLP right now is so strange. These goofy websites keep precisely-curated leaderboards ( https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word),">https://paperswithcode.com/sota/lang... and hardworking grad students cannot get within 2x! these reported results.

Penn Treebank (Word Level) Leaderboard | Papers with Code

The current state-of-the-art on Penn Treebank (Word Level) is sMIM (1024) +. See a full comparison of 30 papers with code.

https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word

Winning response. Props for author for responding.

I will accept other submissions if others are motivated to find a different solution.

https://twitter.com/ZihangDai/status/1245905407350112256

Very">https://twitter.com/ZihangDai... interesting explanation for why this is so difficult, and why it should arguably not be used in the future.

https://twitter.com/ZihangDai/status/1245905407350112256

@JesseDodge @ssgrn @nlpnoah Just out of curiosity What& #39;s the over-under on how many GPU hours this would have taken to replicate manually?

https://arxiv.org/abs/1909.03004 ">https://arxiv.org/abs/1909....

You can follow @srush_nlp.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: