Thread by @bkastl, Probably Twitters Crop algorithm is a pretty simple Saliency. We will see… [...]

Probably Twitters Crop algorithm is a pretty simple Saliency. We will see…

Guess: Upper right corner…

https://abs.twimg.com/emoji/v2/... draggable="false" alt="🤔" title="Denkendes Gesicht" aria-label="Emoji: Denkendes Gesicht">

" title="Guess: Upper right corner… https://abs.twimg.com/emoji/v2/... draggable="false" alt="🤔" title="Denkendes Gesicht" aria-label="Emoji: Denkendes Gesicht">" class="img-responsive" style="max-width:100%;"/>

(Mastodons cropping algorithm is in the middle of an image per default but has custom focus points which can be user selected. So no none advanced algorihm bias here)

Didn& #39;t quite hit the threshold so it is probably a bit more "advanced".

Next try: more poo.

https://abs.twimg.com/emoji/v2/... draggable="false" alt="💩" title="Dreckshaufen" aria-label="Emoji: Dreckshaufen">

(installing a saliency generator in the meantime…)

(installing a saliency generator in the meantime…)" title="Next try: more poo. https://abs.twimg.com/emoji/v2/... draggable="false" alt="💩" title="Dreckshaufen" aria-label="Emoji: Dreckshaufen">(installing a saliency generator in the meantime…)" class="img-responsive" style="max-width:100%;"/>

https://abs.twimg.com/emoji/v2/... draggable="false" alt="💡" title="Elektrische Glühbirne" aria-label="Emoji: Elektrische Glühbirne"> Thumbnail cropping is only really visible in Twitters web interface.

Excuse me, I basically was right:

Thumbnail cropping is only really visible in Twitters web interface.Excuse me, I basically was right:" title="https://abs.twimg.com/emoji/v2/... draggable="false" alt="💡" title="Elektrische Glühbirne" aria-label="Emoji: Elektrische Glühbirne"> Thumbnail cropping is only really visible in Twitters web interface.Excuse me, I basically was right:">

So how does this thing work:

Basically with a saliency map: https://en.wikipedia.org/wiki/Saliency_map

The">https://en.wikipedia.org/wiki/Sali... goal is to find unique parts of an image. Sounds kinda neutral, but… most implementations work with colors or grey levels.

Behind the scenes: What does an saliency map detection generate?

A visual representation of areas of an image which might be highly different or interesting a visual representation:

Now we can reverse this process.

Lets put a colorful and unique kitten in the bottom of the image:

And here we have it: our lovely kitten perfectly cropped

https://abs.twimg.com/emoji/v2/... draggable="false" alt="😻" title="Grinsendes Katzengesicht mit herzförmigen Augen" aria-label="Emoji: Grinsendes Katzengesicht mit herzförmigen Augen">

" title="And here we have it: our lovely kitten perfectly cropped https://abs.twimg.com/emoji/v2/... draggable="false" alt="😻" title="Grinsendes Katzengesicht mit herzförmigen Augen" aria-label="Emoji: Grinsendes Katzengesicht mit herzförmigen Augen">" class="img-responsive" style="max-width:100%;"/>

So - what did we learn:

Twitters cropping is based an saliency.
Twitters cropping is quite sensitive to slight variations (gradient image 1).
Twitters cropping decision uses the last most interesting part of an image (aliasing pattern with similar interesting areas).

All these examples didn& #39;t use real people or skin colors. What happens when we use skin colors?

Let& #39;s prepare something:

So. What would we expect from the saliency theory: that the upper image wins a bit.

The result of this image is bit unexpected for me:

Can be either that is "something like saliency with machine learning stuff" or that we just hit an use case with a weird threshold.

Lets make the second picture more uninteresting…

Result as expected: the couple an the top wins.
Because the difference is higher this time.

Next question: what happens in black and white?

The saliency map is basically the same.

Black and white doesn& #39;t change the result. So basically it seems to be somthing like saliency only (+ some tweaks maybe).

Another test image:

Result: As expected.

The second half is uninteresting in the saliency map so the top is prioritized.

But heres the final test:

Real people vs. really interesting mandelbrot image.

Tada

https://abs.twimg.com/emoji/v2/... draggable="false" alt="👩‍🔬" title="Woman scientist" aria-label="Emoji: Woman scientist">

" title="Tada https://abs.twimg.com/emoji/v2/... draggable="false" alt="👩‍🔬" title="Woman scientist" aria-label="Emoji: Woman scientist">" class="img-responsive" style="max-width:100%;"/>

So what did we learn:

Twitters crop decision is probably Saliency map based.

And you probably can beat it with a nice looking mandelbrot sequence.

(that doesn& #39;t mean that twitters algorithm cannot be biased etc.

But this just explains how this things works internally)

Proof by Twitter that I was right

https://abs.twimg.com/emoji/v2/... draggable="false" alt="😬" title="Grimasse schneidendes Gesicht" aria-label="Emoji: Grimasse schneidendes Gesicht">

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/Smart-Auto-Cropping-of-Images.html

(and">https://blog.twitter.com/engineeri... that smart neural fancy shit

https://abs.twimg.com/emoji/v2/... draggable="false" alt="™️" title="Registered-Trade-Mark-Symbol" aria-label="Emoji: Registered-Trade-Mark-Symbol"> isn’t always that advanced as you would think)

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/Smart-Auto-Cropping-of-Images.html

Speedy Neural Networks for Smart Auto-Cropping of Images

We have developed a ML model that can predict the most interesting parts of an image and have applied it to improve auto-cropping of images on Twitter.

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/Smart-Auto-Cropping-of-Images.html

In case you want to reverse engineer further:

The paper with the machine learning stuff and some example saliency maps
https://arxiv.org/pdf/1801.05787.pdf">https://arxiv.org/pdf/1801....

Latest Threads Unrolled: