🤏 Calculating Error and Parameter Estimation
This page consolidates notes from CEE308 Environmental Engineering Lab at Princeton University, Spring 2022, taught by Professor Peter Jaffe.
Mean, variance, and standard deviation from $n$ direct measurements
The mean $\bar x$ is calculated as:
\[\bar x = \sum_i \frac{x_i}{n}\]Where $\bar x$ is the mean, $x_i$ is the $i$-th measurement, and $n$ is the number of measurements.
If the true standard error $\sigma$ or variance $\sigma^2$ for the population is known, and if the mean $\bar x$ was estimated from $n$ samples, then variance of the mean $\sigma_{\bar x}^2$ can be calculated as
\[\sigma_{\bar x}^2 = \frac{\sigma^2}{n}\]If the variance $S^2$ of the population is not known and must be estimated (as in most cases), then use the following equation:
\[S^2 = \frac{1}{n-1}\sum_i \left(x_i-\bar x\right)^2\]Where in all these equations:
- $\sigma$ refers to the true error
- $S$ refers to the estimated error
Least-squares Linear Regression
Suppose you want to fit the function
\[y = a + bx\]to a set of 2D data of $n$ samples. Then parameters $a$ and $b$ can be estimated using the following equations:
\[a = \frac{\sum_i x_i^2\sum_i y_i - \sum_i x_i \sum_i x_i y_i}{n\sum_i x_i^2 - \left(\sum_i x_i\right)^2}\] \[b = \frac{n\sum_i x_i y_i - \sum_i x_i \sum y_i}{n\sum_i x_i^2 - \left(\sum_i x_i\right)^2}\]Where, for clarity, $a$ is the intercept and $b$ is the slope of the regression.
Parameter error estimation
The set of residuals $E$ can be defined as the set of differences between the original $y$-values and the $y$-values estimated by the regression using the original $x$-values as arguments:
\[E = \{\epsilon_i : \epsilon_i = y_i - \left(a + bx_i\right), \forall (x_i, y_i)\}\]Then the variance of the regression model can be calculated as follows:
\[S_{y/x}^2 = \frac{1}{n-2} \sum_i \epsilon_i^2,\;\forall \epsilon_i\in E\]And the variance of the slope $b$:
\[S^2_b = \frac{S_{y/x}^2}{\sum_i\left(x_i-\bar x\right)^2}\]And the variance of the intercept $a$:
\[S^2_a = S_{x/y}^2\left(\frac{1}{n} + \frac{\bar x^2}{\sum_i\left(x_i - \bar x\right)^2}\right)\]Note that the error $S$ (usually denoted $\sigma$, even when estimated) of either parameter is simply the square root of the respective variance.
Therefore when writing a technical paper where you produce “error bars” or $\pm$ standard deviation, calculate $\sigma = \sqrt{S^2}$ for each determined parameter.
Error propagation
Given the analytical model
\[y = f\left(x_1, x_2, ..., x_3\right)\]Then the variance of the model is
\[\sigma^2_y = \sum_i \frac{\sigma^2_{\bar x, i}}{x_i}\left(\frac{\partial y}{\partial x_i}\right)^2\]