Probably Twitters Crop algorithm is a pretty simple Saliency. We will see…
Guess: Upper right corner… https://abs.twimg.com/emoji/v2/... draggable="false" alt="🤔" title="Denkendes Gesicht" aria-label="Emoji: Denkendes Gesicht">
(Mastodons cropping algorithm is in the middle of an image per default but has custom focus points which can be user selected. So no none advanced algorihm bias here)
Didn& #39;t quite hit the threshold so it is probably a bit more "advanced".
So how does this thing work:

Basically with a saliency map: https://en.wikipedia.org/wiki/Saliency_map

The">https://en.wikipedia.org/wiki/Sali... goal is to find unique parts of an image. Sounds kinda neutral, but… most implementations work with colors or grey levels.
Behind the scenes: What does an saliency map detection generate?

A visual representation of areas of an image which might be highly different or interesting a visual representation:
Now we can reverse this process.

Lets put a colorful and unique kitten in the bottom of the image:
And here we have it: our lovely kitten perfectly cropped https://abs.twimg.com/emoji/v2/... draggable="false" alt="😻" title="Grinsendes Katzengesicht mit herzförmigen Augen" aria-label="Emoji: Grinsendes Katzengesicht mit herzförmigen Augen">
So - what did we learn:

Twitters cropping is based an saliency.
Twitters cropping is quite sensitive to slight variations (gradient image 1).
Twitters cropping decision uses the last most interesting part of an image (aliasing pattern with similar interesting areas).
All these examples didn& #39;t use real people or skin colors. What happens when we use skin colors?

Let& #39;s prepare something:
So. What would we expect from the saliency theory: that the upper image wins a bit.

The result of this image is bit unexpected for me:

Can be either that is "something like saliency with machine learning stuff" or that we just hit an use case with a weird threshold.
Lets make the second picture more uninteresting…
Result as expected: the couple an the top wins.
Because the difference is higher this time.
Next question: what happens in black and white?

The saliency map is basically the same.
Black and white doesn& #39;t change the result. So basically it seems to be somthing like saliency only (+ some tweaks maybe).

Another test image:
Result: As expected.

The second half is uninteresting in the saliency map so the top is prioritized.
But heres the final test:

Real people vs. really interesting mandelbrot image.
Tada https://abs.twimg.com/emoji/v2/... draggable="false" alt="👩‍🔬" title="Woman scientist" aria-label="Emoji: Woman scientist">
So what did we learn:

Twitters crop decision is probably Saliency map based.

And you probably can beat it with a nice looking mandelbrot sequence.
(that doesn& #39;t mean that twitters algorithm cannot be biased etc.

But this just explains how this things works internally)
Proof by Twitter that I was right https://abs.twimg.com/emoji/v2/... draggable="false" alt="😬" title="Grimasse schneidendes Gesicht" aria-label="Emoji: Grimasse schneidendes Gesicht">

https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/Smart-Auto-Cropping-of-Images.html

(and">https://blog.twitter.com/engineeri... that smart neural fancy shit https://abs.twimg.com/emoji/v2/... draggable="false" alt="™️" title="Registered-Trade-Mark-Symbol" aria-label="Emoji: Registered-Trade-Mark-Symbol"> isn’t always that advanced as you would think)
In case you want to reverse engineer further:

The paper with the machine learning stuff and some example saliency maps
https://arxiv.org/pdf/1801.05787.pdf">https://arxiv.org/pdf/1801....
You can follow @bkastl.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: