Over the weekend reports of racial/gender bias in Twitter's AI-based image cropping have started blowing up. I wanted to add some context from my perspective as an ex-employee and as a contributor to the research the product is based on.
First: the community is right to worry and talk about these issues, and to hold Twitter accountable for ensuring its products work for everyone. If indeed its product fails in a racially or gender-biased way, Twitter should learn from that and fix it, and I'm confident they will.
However, I have seen people jump to conclusions about negligence or oversight of the team who worked on this, or jump to generalisations on the basis of a handful examples, neither of which do I think is constructive. I wanted to add some context
As some Twitter employees said, the team did in fact investigate racial (and gender) bias BEFORE deploying the method for smart cropping. Details of this are not public. As I do not work there now, I am not in a position to talk in detail, but I hope Twitter will release it soon
I'm sure there's no such thing as a perfect bias analysis, and it can always be improved. I'm also aware of a lot of great work (like Gender Shades) the community has developed to make these analyses more robust/easier. I'm sure there are things to learn and room to improve.
But my colleagues did look at this, anticipated the problem and cared enough to run tests before deployment. I think the last thing our community wants is to discourage people to work on similar problems in the future. Equally, Twitter shouldn't throw them under the bus.
Some of the outrage was based on a small number of reported failure cases. While these failures look very bad, there's work to be done to determine the degree to which they are associated w/ race or gender. Sampling bias and confirmation bias can lead to premature conclusions
Some members of the community have started more systematic investigations. I hope there's an opportunity to continue this work transparently and in constructive collaboration between Twitter and the community. https://twitter.com/vinayprabhu/status/1307497736191635458
Tech companies and ML researchers appear to have a poor track record anticipating and mitigating failures like this that effect certain groups disproportionately, this much seems obvious in hindsight. So I can see how these examples fit a narrative that people just didn't care.
This also follows a history of similar episodes involving CV/ML methods, most recently the super-resolution one. I remember when I first saw those I was outraged, and tweeted before even verifying the details. Your outrage is understandable, I did not want to minimise that.