Thread by @everysum, Exciting @GoogleAI work, so-called "Big Transfer": revisit transfer learning for computer vision, [...]

David

everysum

Exciting @GoogleAI work, so-called "Big Transfer": revisit transfer learning for computer vision, simplify various tricks and use huge datasets to get a big perf boost across many tasks. Kind of BERT-inspired.

Also yet another case of "no way a small player could have the resources to do this".

I& #39;m more at ease with this than most (same in so much of traditional engineering!) but it& #39;s worth pointing out.

Emphasises there& #39;s still room to get big performance boosts by increasing amount of data in pretraining. Makes sense in light of various @ICCV19 papers on similar topics.

Large-scale training benefits from GroupNorm+WeightStandardisation, and maybe can just become the standard solution in many cases? I need to read more about WS actually...

Benefits of large dataset only show up in larger models. Makes sense, but is a valuable corrective to my usual assumption of "oh, anything beyond ResNet50 is just aimed at marginal leaderboard gains rather than practical improvement".

Large models + large datasets = more compute budget needed.

Sounds obvious, but scale emphasised. Compare 1.3M vs 14M train images (ILSVRC-2012 vs ImageNet-21k). Same compute budget on the larger dataset -> signif perf drop.

"Only when we train longer (3x and 10x) do we see the benefits of training on the larger dataset."

Also lol, "The learning progress [large model+dataset] seems to be flat even after 8
GPU-weeks, but after 8 GPU-months progress is clear."

Nice that they use simple heuristics to set hyperparameters, when finetuning. Lots of back and forth over this lately

Heuristics working well is def the world I want to live in...

ObjectNet results (challenging "real world" images) are really exciting. ~25% improvement on state of the art.

"we found that around half
of the model’s mistakes [...] are due to ambiguity or label noise, and in only 19.21% of the ILSVRC-2012 mistakes do human raters clearly agree with the label over the prediction."

" [...] performance on the standard vision benchmarks seems to approach a saturation point." (!)

Few-shot and more diverse tasks are, of course, "much further from saturation".

Anyway, exciting paper, exciting time! https://arxiv.org/abs/1912.11370 ">https://arxiv.org/abs/1912....

You can follow @everysum.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: