Toggle navigation
TWText.com
TWText.com
faq
Contact US
Follow US
Tim Dettmers
Tim_Dettmers
How can you successfully train transformers on small datasets like PTB and WikiText-2? Are LSTMs better on small datasets? I ran 339 experiments worth 568 GPU hours and came up
Read more