Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If the sample is non-random then the conclusions are unreliable, regardless of the population size

That's not true. You may still get meaningful data, but it's less meaningful (i.e. the required interval to achieve a given level of confidence becomes wider, perhaps significantly so).

But either way, I mentioned "non-random" to pile onto the ridiculously low sample size. Even a random sample of such a small size would have given a low-confidence result.

> Because it seems obvious that if you want to measure the salinity of the ocean, you can scoop up a cup of water from it and analyse that. You don't have to use a different size cup for different size oceans, or even know how big your ocean is.

You're expressing a "population" parameter that doesn't actually exist. There's no such thing as "salinity of the ocean"; it changes depending on where (and what depth!) you are at. Sampling the salinity of the water in the cup tells you, at best, about the water where you're at.

Now you could probably talk about things like "mean salinity of the oceans", but to determine good bounds for that you would have to sample. And to figure out how much you must sample, you do indeed have to have an idea of the total population size, even if it's just to determine that the population size is so much larger than the sample size that you can ignore the population size and simply use the standard error formula.

If the population size is not much greater than the sample size then there is an adjustment you should make (the finite population correction).



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: