Exciting @GoogleAI work, so-called "Big Transfer": revisit transfer learning for computer vision, simplify various tricks and use huge datasets to get a big perf boost across many tasks. Kind of BERT-inspired.
Also yet another case of "no way a small player could have the resources to do this".

I& #39;m more at ease with this than most (same in so much of traditional engineering!) but it& #39;s worth pointing out.
Emphasises there& #39;s still room to get big performance boosts by increasing amount of data in pretraining. Makes sense in light of various @ICCV19 papers on similar topics.
Large-scale training benefits from GroupNorm+WeightStandardisation, and maybe can just become the standard solution in many cases? I need to read more about WS actually...
Benefits of large dataset only show up in larger models. Makes sense, but is a valuable corrective to my usual assumption of "oh, anything beyond ResNet50 is just aimed at marginal leaderboard gains rather than practical improvement".
Large models + large datasets = more compute budget needed.

Sounds obvious, but scale emphasised. Compare 1.3M vs 14M train images (ILSVRC-2012 vs ImageNet-21k). Same compute budget on the larger dataset -> signif perf drop.
"Only when we train longer (3x and 10x) do we see the benefits of training on the larger dataset."
Also lol, "The learning progress [large model+dataset] seems to be flat even after 8
GPU-weeks, but after 8 GPU-months progress is clear."
Nice that they use simple heuristics to set hyperparameters, when finetuning. Lots of back and forth over this lately

Heuristics working well is def the world I want to live in...
ObjectNet results (challenging "real world" images) are really exciting. ~25% improvement on state of the art.
"we found that around half
of the model’s mistakes [...] are due to ambiguity or label noise, and in only 19.21% of the ILSVRC-2012 mistakes do human raters clearly agree with the label over the prediction."
" [...] performance on the standard vision benchmarks seems to approach a saturation point." (!)
Few-shot and more diverse tasks are, of course, "much further from saturation".
Anyway, exciting paper, exciting time! https://arxiv.org/abs/1912.11370 ">https://arxiv.org/abs/1912....
You can follow @everysum.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: