1/4 WTF guys I think I broke ML: loss & acc 🡅 together! reproduced here https://github.com/thegregyang/LossUpAccUp. Somehow good accuracy is achieved *in spite of* classic generalizn theory (wrt the loss) - What's goin on? @roydanroy @prfsanjeevarora @ShamKakade6 @BachFrancis @SebastienBubeck
2/4 More precisely, classic theory goes like this "when we train using xent loss, we get good pop loss by early stopping b4 valid loss 🡅. B/c xent is a good proxy for 0-1 loss, we expect good pop accuracy from this procedure." But here we got good acc w/o getting good pop loss
3/4 Practically, this is no biggie if we can track some quality metric like accuracy. Butwhatabout e.g. language modeling that only tracks loss/ppl? How do we know the NN doesn't learn great language long after val loss blows up? @srush_nlp @ilyasut @nlpnoah @colinraffel @kchonyc
You can follow @TheGregYang.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: