The SARS-CoV-2 furin cleavage site is yet again in the news - this time because of a quote by Nobel laureate David Baltimore.

The site is not a "smoking gun", nor does it "make a powerful challenge to the idea of a natural origin".

Quite the opposite, so a little science 🧵👇
The furin cleavage site (FCS) / polybasic cleavage site is present in SARS-CoV-2 at the S1/S2 junction of the spike protein where it mediates the cutting (by the host protease furin, among others) of the spike, which is required for infection of cells.
The FCS was created by an out-of-frame insertion of "CTCCTCGGCGGG" creating the "(P)RRAR" amino acid sequence, which constitutes a suboptimal polybasic cleavage site that is important for expanding SARS-CoV-2 host range, it's transmission and pathogenesis, etc.
How did SARS-CoV-2 acquire the FCS? We don't know, however, we know four main mechanisms often lead to insertions:

(1) mutation

(2) polymerase slippage

(3) template switching

(4) recombination

All of which play key roles in coronavirus (incl. SARS-CoV-2) evolution.
🚨 The exact same (P)RRAR FCS found in SARS-CoV-2 can be found in different viruses, including Feline coronavirus (FCoV), which is an alphacoronavirus.

Note, site not present in all closely related viruses and plenty of indels around the site - like SARS-CoV-2 vs SARSr CoVs.
If we zoom in on the (P)RRAR site in SARS-CoV-2 and compare it to the one found in (some) FCoV sequences, we can see there's a fair bit of homology outside the FCS too - including likely O-linked glycans being conserved.
Importantly, however, in recent month we have started seeing the "P" mutating towards residues creating more optimal furin sites - P681H and, especially, P681R, which can be found in B.1.1.7 and B.1.617.x, suggesting the virus may evolve towards more efficient usage of the site.
🚨 So Baltimore's first point - that the FCS found in SARS-CoV-2 is somehow unusual - is simply incorrect. FCSs are found in a multitude of different coronaviruses, indels come and go frequently, and the exact (P)RRAR can be found in other coronaviruses.
Now, the codons. Here, Baltimore is talking about the two codons coding for the first two arginines (R) following the P - CGG. The CGG codon is rare in viruses because it's an example of an unmethylated "CpG" site that can be bound by TLR9, leading to immune cell activation.
🚨 Despite being rare, however, CGG codons *are* found in all coronaviruses, albeit at low frequency. Specifically, of all arginine codons, CGG is used at these frequencies in these viruses:

SARS: 5%
SARS2: 3%
SARSr: 2%
ccCoVs: 4%
HKU9: 7%
FCoV: 2%

Nothing unusual here.
🚨Furthermore, if we go back to the FCoV sequences and compare them to SARS-CoV-2 at the nucleotide level you'll see that FCoV also uses CGG to code for R immediately following the P. The next R is CGA (non-CpG) in FCoV, while it's CGG in SARS-CoV-2 - one nucleotide difference.
We see CGG multiple times in different ways - here's an example comparing another "PR" stretch between SARS-CoV-2, RaTG13, and SARS-CoV in the N gene. Note how SARS-CoV-2 and RaTG13 both use CGG, while SARS-CoV-2 uses CGC for the first R, while later R's are coded by CGT or AGA.
One final point about the CGG codons in the FCS - if they were somehow "unnatural", we'd see SARS-CoV-2 evolve away from "CGG" during the ongoing pandemic. We have more than a million genomes to analyze, so what do we find if we look at synonymous mutations at the "CGG_CGG" site?
🚨Remarkably stable. Specifically, CGG is 99.87% conserved in the first codon and 99.84% conserved in the second.

This is *very* strong evidence that SARS-CoV-2 'prefers' CGG in these positions.
R is coded by six different codons, yet the simple single transition "CGA" is only observed in ~0.02% of sequences. The second most 'popular' codon at these sites is "CGT" (a transversion) at 0.11% frequency.

In other words - there is nothing unusual about the codons either.
So Baltimore's second point is also false, invalidating his hypothesis that the "FCS [...] with its arginine codons [...] was the smoking gun for the origin of the virus".

Baltimore does not provide any evidence to support his hypothesis and the data support a natural origin.
Does this disprove a lab leak? No. However, it disproves there being a "smoking gun" in the FCS and lends further evidence to natural emergence - but it also does not *prove* that scenario.

To this day, we have yet to see any scientific evidence supporting a lab leak.
Variants of 👇 have come up - it's false. Specifically:

1. The events are not independent, hence the calculation is incorrect.

2. It's the same argument used by creationists about "irreducible complexity" - also false:

https://en.wikipedia.org/wiki/Irreducible_complexity

https://www.americanprogress.org/issues/religion/news/2006/04/10/1934/the-flaws-in-intelligent-design/
As to Richard's final point - well... #introspection
You can follow @K_G_Andersen.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: