The ImageNet dataset is built from internet images.

Here's an example image search result for "tench" (a kind of fish). Does having a consistent category of images in which a dominant element is a human mess things up? Yes. Yes, it does.
https://aiweirdness.com/post/622648824384602112/when-data-is-messy
If you train an image recognition algorithm on ImageNet, then ask it which part of the image it found the most useful for recognizing a tench fish, this is what it'll highlight.

It has no idea the human fingers aren't part of the fish.
https://medium.com/bethgelab/neural-networks-seem-to-follow-a-puzzlingly-simple-strategy-to-classify-images-f4229317261f
If you train an image generating algorithm on ImageNet, pretty much every generated image of a tench fish looks like these, including the weird fascination with human fingers.

(generated from BigGAN via http://artbreeder.com )
In the ImageNet category "microphone" there must have been lots of concert pictures where the microphone is only a small portion of the image. When BigGAN generates microphone images, it often tends to leave out the actual microphone.
The ImageNet category "football helmet" probably has pictures that are not of helmets - when BigGAN does football helmets, some of the humans are very clearly not wearing helmets. One appears to be wearing a baseball helmet? Which would be in line with: https://medium.com/bethgelab/neural-networks-seem-to-follow-a-puzzlingly-simple-strategy-to-classify-images-f4229317261f
ImageNet grabbed messy internet data. Some of that mess is alarming. @katecrawford @trevorpaglen and others have documented ImageNet images that are pornographic and/or likely included without the subject's consent. https://www.excavating.ai/ 
@Abebab and @vinayprabhu documented problematic images in ImageNet, as well as some HIGHLY terrible categories in the 80 Million Tiny Images dataset, automatically harvested from internet usage and including super racist terms. https://www.theregister.com/2020/07/01/mit_dataset_removed/
That brings us back to machine learning's diversity problems (and the bias within the field that perpetuates it).

Here's a few tweets on where that comes from & things we can do about it, including following all the people I list at the end of the thread https://twitter.com/JanelleCShane/status/1276637149723422720?s=20
You can follow @JanelleCShane.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: