🧮 Standard error and confidence interval calculations
I’ve always found it difficult to remember exactly how to calculate standard error and produce confidence intervals for a measure. This post is a quick reference for me and for others who are trying to produce confidence intervals for data metrics.
The following is an adaptation of Ch. 7.2 from Devore’s Probability & Statistics for Engineering and the Sciences, 8th edition (link)
Confidence interval for population proportion
Let p denote the proportion of the population with a given trait. Then 1−p is the proportion without the given trait. Suppose a random sample of size n is obtained with X being the number of objects/individuals with the desired trait in the sample.
The estimator of p for the population is ˆp=Xn, which is the sample fraction of successes. Then
σˆp=√p(1−p)nWhen the unkown p is normalized using ˆp, the confidence interval can be written
P(−zα/2<ˆp−p√p(1−p)/n)≈1−αThis can be solved to arrive at the final form
p=ˆp+z2α/2/(2n)1+z2α/2/n±zα/2√ˆp(1−ˆp)/n+z2α/2/(4n2)1+z2α/2/nWhere zα/2 is the critical value drawn from a two-tailed normal distribution:
- 95% CI: zα/2=1.96
- 99% CI: zα/2=2.58