One of the things scientists do when facing an outbreak of a new disease is build 'phylogenies'. These estimate how pathogen isolates are related by comparing their genomes, and using them to make a 'family tree'. You've probably seen these at @nextstrain 1/n
But this runs into challenges when all your genomes are really closely related. A lot can rest on a single difference in a sequence of ~30,000 A C G and Ts (OK 'U's strictly speaking for an RNA virus 2/n
These aren't new problems. Years ago during the big european E. coli outbreak we had to look at single changes in a genome more than 5 *million* letters long (kudos to @yhgrad for leading that work, which you can read about here if you are interested) https://www.pnas.org/content/109/8/3065 3/n
What does this mean for the pandemic? Well the global diversity of the virus is growing, as we would expect as the population of virus grows, this is not surprising. However there's still not a lot of it. Which makes life difficult 4/n
when working with these trees we need to think carefully about, among other things, sampling - there are a lot of places in the world which are very poorly sampled and if you've not looked there, you have no idea what is circulating 5/n
I'm looking forward to analyzing more sequence, and things should get easier as SNP (and other) variation builds up. But all that variation is the result of the growing pandemic. I wish it had never had the chance to accumulate it 8/end
You can follow @BillHanage.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: