My teammates and I have spent a couple years or so building defenses for lookalike domains into Chrome. Here’s a thread of things we’ve learned; h/t @jdeblasio, @meacer, Behnood Momenzadeh, and others for many of these insights 🧵 https://twitter.com/owencm/status/1314333287087632384
1/ 📌 Lookalike domains are rampant on the web, but they’re not all phishing. My guess is that they’re used more for spearphishing than mass phishing campaigns, but I don’t have the best visibility into spearphishing tactics.
2/ A *lot* of them are ad spam, referral scams, pirating/torrents, or other brand abuse that doesn’t necessarily directly harm users.
3/ 📌 No matter how confident you are that some pattern is only used for evil (e.g. ban all URLs with “.com-“ in them), there will be false positives, and we have to handle them somehow.
4/ The quality of the false positives matters as well as the quantity; showing warnings on 10 legitimate businesses could be just as bad as showing warnings on 1000 low-quality ad farms even if they are all technically false positives.
5/ Speaking of false positives, it’s hard to even define and classify false/true positive in this space. We often measure true positives, false positives, and a variety of other categories (errors, parked domains, low-quality/spam, etc.).
6/ 📌 And also on the subject of false positives, the warning UI doesn’t need to be a big full-page scary warning that unequivocally labels the site as malicious.
7/ Subtler warning UIs can be effective too, though often in more subtle ways, for example changing the way the user interacts with the page.
8/ 📌 The browser is in a unique position to make local decisions about what constitutes a lookalike domain because it knows which sites the person visits often and are therefore especially likely to be spoofing targets, plus can use information like the redirect chain.
9/ Of course, making local decisions in the browser has downsides too; we don't have all the information one would have when making an offline, out-of-band decision.
10/ 📌 A final challenge: not clear how to measure recall. How do we measure the whole space of lookalike domains and determine how much of it we're covering? Still working on this one 🤔
You can follow @estark37.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: