Nice piece from @NathanBenaich

"Covid-19 is so new and complex that the data needed to train AI to combat it does not exist"

^ a common issue in all of drug discovery, not just Covid-19... https://twitter.com/NathanBenaich/status/1307638896403132416
We need more few-shot learners in bio + chem to make these problems tractable even when large datasets don’t exist or can’t be readily generated (a big focus for us @invivo_ai)
Groups like @RecursionPharma are leading the charge on data generation and showing compelling evidence of these approaches working in practice: https://twitter.com/recursionchris/status/1307729228582936576?s=21
But what if we can’t scale the biology or chemistry to the constraints of existing deep learning algos?

We’ve been conditioned to focus on the data piece & that bigger data = better prediction. A useful heuristic but not the full story
In drug discovery, we need new ML approaches built *specifically* for small / sparse datasets, closely integrated with strategies for data augmentation + active learning (smart data vs big data)
Seeing strong evidence in current pharma collabs that this combo can quickly unlock previously intractable problems for ML in drug discovery
You can follow @dcohen_mtl.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: