There are a lot of phylogenetic trees 🌳getting bounced about in the midst of #COVID19 and they’re getting really really big. A phylogenetic tree is a valuable resource mid #pandemic but they're not always easy to interpret.

Here's a brief explainer on just one aspect. 1/11
A🌳 reconstructs the evolutionary relationships between viruses; those which are more closely related to each other and those which are more distant. We build these trees by comparing 🧬s.

For #SARSCoV2 there are ~30,000 positions to look at but only a small fraction vary. 2/11
Let's take a toy example of five #SARSCov2.

This 🌳 for example places A & B as more closely related to each other than to C, D or E (outgroup).

Each tree node therefore provides the theoretical Most Recent Common Ancestor (MRCA) relative to the samples in the dataset. 3/11
The data underlying trees are the nucleotide changes (mutations - here coloured shapes) identified across the ~30,000 length sequence alignment.

Eg. here closely related A & B share the same orange diamond change (mutation) not seen in C, D or E. 4/11
These mutations are not necessarily a bad thing. They largely represent the gradual accumulation of, mainly harmless, typos which can occur during replication

It is these mutations that allow us to log & trace the pandemic using 🧬 data.

More here: http://tinyurl.com/vs9gqxg  5/11
In addition for each🧬we know where in the world it was sampled.

Say A & B are from Europe, C & D from the USA & E from China.

We can infer a likely scenario whereby the ancestor of A & B (MRCA A,B) was in Europe and the ancestor of C & D (MRCA C,D) was in the USA. 6/11
Though note the above inference is always relative to what we have sequenced.

The available genetic data is getting large BUT it is still only a very small portion of the diversity circulating globally 🗺️

...though this may be the most densely sequenced outbreak to date. 7/11
For viruses sampled in the UK we see related clades falling in multiple locations in the🌳& interspersed with those from other regions.

Eg. here the location of B & D suggests >1 UK introduction.

These can then seed local transmissions leading to expanding related clades. 8/11
Here is the actual (WIP) #SARSCoV2 🌳highlighting the multiple placements of UK viruses: https://twitter.com/alanmcn1/status/1249656885969661952

(Much prettier trees are available as queryable visualisations on the wonderful @nextstrain aided by huge volumes of data shared by the community on @GISAID.)

9/11
But the UK is not unusual. Similar patterns are seen for many densely sequenced countries.

The new “Regional” tab in @nextstrain provides an easy way to view these.

Eg. see 👇 for viruses sampled in the USA (enlarged light green tips) falling all across the global tree. 10/11
Phylogenetic trees provide powerful surveillance tools pointing to multiple introductions of #SARSCoV2 to countries all around the 🌍followed by local #transmissions.

For more on inference from phylogenies see this brilliant webinar by @firefoxx66: https://tinyurl.com/v9xpucj . 11/11
You can follow @LucyvanDorp.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: