I’ve always found it difficult to remember exactly how to calculate standard error and produce confidence intervals for a measure. This post is a quick reference for me and for others who are trying to produce confidence intervals for data metrics.

The following is an adaptation of Ch. 7.2 from Devore’s Probability & Statistics for Engineering and the Sciences, 8th edition (link)

Confidence interval for population proportion

Let p denote the proportion of the population with a given trait. Then 1−p is the proportion without the given trait. Suppose a random sample of size n is obtained with X being the number of objects/individuals with the desired trait in the sample.

The estimator of p for the population is ˆp=Xn, which is the sample fraction of successes. Then

σˆp=√p(1−p)n

When the unkown p is normalized using ˆp, the confidence interval can be written

P(−zα/2<ˆp−p√p(1−p)/n)≈1−α

This can be solved to arrive at the final form

p=ˆp+z2α/2/(2n)1+z2α/2/n±zα/2√ˆp(1−ˆp)/n+z2α/2/(4n2)1+z2α/2/n

Where zα/2 is the critical value drawn from a two-tailed normal distribution:

  • 95% CI: zα/2=1.96
  • 99% CI: zα/2=2.58