Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are two different (and independent) forces at play here: mutation and natural selection.

The question is, given a known ratio of 3.5 NS/S, what is the likelihood that there would be only 4 non-silent mutations in any random strain (like omicron)? What the mutations are is irrelevant to this question. (I will use mutations and variants interchangeably below, but both mean a single change in the genetic code for the virus).

Because we are dealing with integer counts, you can’t really have 3.5 mutations. You can only have 1, 2, 3, etc. So, 3.5 is just the average of all ratios. When dealing with count data like this, the Poisson distribution is what you use. When you have 33 non-silent mutations, you could have 1, 2, 3, etc silent mutations. We’d expect to see 9.4 (33/3.5), but you can’t have 0.4 of a mutation, only integers. There is a specific probability associated with each possible value (1,2,3…), and that can be calculated using the Poisson distribution.

You are most likely to see 9 or 10, but all other values are also possible. To calculate how likely you are to see exactly 4 silent variants, you use the Poisson distribution. Again, we will expect that the rate should be 33/3.5 silent variants.

In R, you’d do:

    dpois(4, 33/3.5)
    [1] 0.02647255
So, if we had 33 nonsilent variants, we’d expect to see exactly 4 silent variants 2.6% of the time. To put this into context, the most likely number of silent mutations is 9, which is expected only 13% of the time.

We normally think in terms of “how likely is it that we’d see 4 or fewer mutations”, so we re-run the test for 0:4 and add them up:

    sum(dpois(0:4, 33/3.5))
    [1] 0.04211515
Which is how I got a probability of 0.042. And which answers the specific question — if we see 33 non silent mutations in a strain, how likely is it that we would be 4 or fewer silent mutations in the same strain? 4.2% of the time.

Given these numbers, I’d say it is plausible that you’d see this ratio of NS/S mutations occur naturally. It would be rare, but still somewhat expected to occur.

Now this says nothing about what the mutations are, or why the omicron strain is so prevalent. This is where natural selection takes over and this combination of mutations is out competing all others.



I was not taking an issue with the calculation of the 4% chance, but rather with this phrase: "But with millions of people infected, this ratio of S/NS variations would still happen at a pretty high absolute number."

I thought you were implying that since this was a 4% chance happening over millions of infection events, it was a virtual certainty, and thus the 4% factor made this highly probable. To which I counter argued, the 4% chance matters and should be accounted as a factor in a "natural vs lab leak" model, because everything seems to point that "a variant as infectious as Omnicron appears" was a single event.


It is a virtual certainty that a 4% likely event would occur given the millions of people that have been infected (and the subsequent mutations occurring within each of them). But that doesn't have anything to do with omicron specifically... and I certainly don't intend to suggest a correlation between NS/S and infectiousness. Mutations have a certain rate, but the events themselves are random.

I guess a different way to say this is -- It makes no sense to use the NS/S ratio as a rationale to claim that omicron was a lab-leak. The 33/4 ratio should be rare, but not unexpected. If you were going to claim that, the absolute increase in mutations would be a more compelling argument, but again, there are plausible reasons for how that could occur in the wild too.

We have very incomplete sampling data, so looking at a single strain and determining risk, or likelihood, is very difficult.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: