Evolution experiments in the lab mimic natural evolution. The ratio of functional vs silent mutations has also been used to characterize the evolution of viruses under natural selection.
According to the blog post that we are commenting on, there is a strange ratio of silent to functional mutations. How can this be accounted for if it was a natural course of evolution?
Without looking more at the variants themselves and just going by the numbers in the post, it’s entirely plausible. It’s not likely, but plausible.
Mainly because we are talking rare events. If the level of success is 33/3.5 (9.42), we’d expect that 4 (or less) silent variants would happen with a probability of ~ 0.042 (Poisson X <= 4). So, a bit more than a 4% chance. But with millions of people infected, this ratio of S/NS variations would still happen at a pretty high absolute number.
Anytime you look at numbers like these, remember with a big denominator, even rare events are expected to happen.
If it was the case that the absolute number mattered, we would be able to sample different "Omicron-like" variants and observe various ratios such as 33N/10S, 33N/15S, etc. mutations. But according to the data presented, we don't: We only see a single (kind of) Omicron, the one with (about) 33N/4S mutations, presumably spilling over by a single event, so I don't see how your conclusion follows.
There are two different (and independent) forces at play here: mutation and natural selection.
The question is, given a known ratio of 3.5 NS/S, what is the likelihood that there would be only 4 non-silent mutations in any random strain (like omicron)? What the mutations are is irrelevant to this question. (I will use mutations and variants interchangeably below, but both mean a single change in the genetic code for the virus).
Because we are dealing with integer counts, you can’t really have 3.5 mutations. You can only have 1, 2, 3, etc. So, 3.5 is just the average of all ratios. When dealing with count data like this, the Poisson distribution is what you use. When you have 33 non-silent mutations, you could have 1, 2, 3, etc silent mutations. We’d expect to see 9.4 (33/3.5), but you can’t have 0.4 of a mutation, only integers. There is a specific probability associated with each possible value (1,2,3…), and that can be calculated using the Poisson distribution.
You are most likely to see 9 or 10, but all other values are also possible. To calculate how likely you are to see exactly 4 silent variants, you use the Poisson distribution. Again, we will expect that the rate should be 33/3.5 silent variants.
In R, you’d do:
dpois(4, 33/3.5)
[1] 0.02647255
So, if we had 33 nonsilent variants, we’d expect to see exactly 4 silent variants 2.6% of the time. To put this into context, the most likely number of silent mutations is 9, which is expected only 13% of the time.
We normally think in terms of “how likely is it that we’d see 4 or fewer mutations”, so we re-run the test for 0:4 and add them up:
sum(dpois(0:4, 33/3.5))
[1] 0.04211515
Which is how I got a probability of 0.042. And which answers the specific question — if we see 33 non silent mutations in a strain, how likely is it that we would be 4 or fewer silent mutations in the same strain? 4.2% of the time.
Given these numbers, I’d say it is plausible that you’d see this ratio of NS/S mutations occur naturally. It would be rare, but still somewhat expected to occur.
Now this says nothing about what the mutations are, or why the omicron strain is so prevalent. This is where natural selection takes over and this combination of mutations is out competing all others.
I was not taking an issue with the calculation of the 4% chance, but rather with this phrase: "But with millions of people infected, this ratio of S/NS variations would still happen at a pretty high absolute number."
I thought you were implying that since this was a 4% chance happening over millions of infection events, it was a virtual certainty, and thus the 4% factor made this highly probable. To which I counter argued, the 4% chance matters and should be accounted as a factor in a "natural vs lab leak" model, because everything seems to point that "a variant as infectious as Omnicron appears" was a single event.
It is a virtual certainty that a 4% likely event would occur given the millions of people that have been infected (and the subsequent mutations occurring within each of them). But that doesn't have anything to do with omicron specifically... and I certainly don't intend to suggest a correlation between NS/S and infectiousness. Mutations have a certain rate, but the events themselves are random.
I guess a different way to say this is -- It makes no sense to use the NS/S ratio as a rationale to claim that omicron was a lab-leak. The 33/4 ratio should be rare, but not unexpected. If you were going to claim that, the absolute increase in mutations would be a more compelling argument, but again, there are plausible reasons for how that could occur in the wild too.
We have very incomplete sampling data, so looking at a single strain and determining risk, or likelihood, is very difficult.
The sequence of mutations leading to Omicron is a one-off. Presumably, if there were a thousand variants with the same N mutations as Omicron, you'd see ratios like the ones you describe, but there's only one Omicron. The previous poster is pointing out it's plausible (~5% chance) that the path Omicron took has only 4 S mutations.