Statistical Inference Formulas

Since I could not find a list of formulas anywhere, I thought I would create one.


The notation used is fairly standard, but it is here for completeness.

Notation Meaning
\(\mbox{df}\) Degrees of Freedom
\(\mbox{SE}\) Standard Error
Confidence SE Confidence interval standard error
\(n\) Sample Size
\(s\) Sample standard deviation
\(s^2\) Sample variance
\(\bar{X}\) Sample mean
\(N\) Population size
\(\mu\) Population mean
\(\sigma\) Population standard deviation
\(\sigma^2\) Population variance
\(s_p^2\) Pooled variance
\(\alpha\) The significance level for a hypothesis test (or the probability of a type I errror)

Hypothesis Testing

  • \(H_0\) and \(H_a\) are the null and alternative hypotheses, respectively
  • The p-value: the probability of returning a more extreme value than that of the test statistic given that the null hypothesis is true.
    • Find the probability of a value being less than or equal to that of the test statistic under the chosen distribution (using it’s cdf). Subtract the found value from 1, since we are looking for a more extreme value.
    • For a one-sided test, this is the p-value.
    • For a two-sided test, multiply by 2 (since we need the more extreme value for both sides)
  • Rejection region: If the absolute value of the test statistic is larger than this value then the null hypothesis logically can’t be true, since this directly corresponds to the significance level \(\alpha\)

Single Population

Test df Test se Test Statistic Confidence se Confidence interval
\(\mu\) when \(\sigma\) known - \(\mbox{SE} = \frac{\sigma}{\sqrt{n}}\) \(z = \frac{\hat{\mu}}{\mbox{SE}}\) \(\mbox{SE} = \frac{\sigma}{\sqrt{n}}\) \(\hat{\mu} \pm z(1 - {\alpha \over 2}) \mbox{SE}\)
\(\mu\) when \(\sigma\) unknown \(\mbox{df} = n-1\) \(\mbox{SE} = \frac{s}{\sqrt{n}}\) \(t = \frac{\hat{\mu}}{\mbox{SE}}\) \(\mbox{SE} = \frac{s}{\sqrt{n}}\) \(\hat{\mu} \pm t_{n-1}(1 - \frac{\alpha}{2}) \mbox{SE}\)
Population proportion \(p\) - \(\mbox{SE} = \sqrt{\frac{p_0 (1 - p_0)}{n}}\) \(z = \frac{\hat{p} - p_0}{\mbox{SE}}\) \(\mbox{SE} = \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}\) \(\hat{p} \pm z(1 - {\alpha \over 2}) \mbox{SE}\)

Two Populations

Test df Test se Test Statistic Confidence se Confidence interval
two means equal variances \(\mbox{df} = n_1 + n_2 - 2\) \(\mbox{SE} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\) \(t = \frac{\mu_1 - \mu_2 - d_0}{\mbox{SE}}\) \(\mbox{SE} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\) \(\hat{\mu} \pm t_{\mbox{df}}(1 - \frac{\alpha}{2}) \times \mbox{SE}\)
two means unequal variances * \(\mbox{df} \approx \frac{({s_{1}^2 \over n_1} + {s_2^2 \over n_2})^2}{{s_1^4 \over n_1^2 (n_1 - 1)} + {s_2^4 \over n_2^2 (n_2 - 1)}}\) \(\mbox{SE} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) \(t = \frac{\mu_1 - \mu_2-d_0}{\mbox{SE}}\) \(\mbox{SE} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) \(\hat{\mu} \pm t_{\mbox{df}}(1 - \frac{\alpha}{2}) \mbox{SE}\)


  • *: If conservative \(\mbox{df}\) is used, use the minimum of \(n_1 - 1\) and \(n_2 - 1\)


Source of Variance \(\mbox{df}\) Sum of Square Mean square F statistic p-value
Treatment \(I-1\) \(\mbox{sst} = \sum_{i=1}^I n_i (\bar{X}_i - \bar{X})^2\) \(\mbox{MST} = \frac{\mbox{sst}}{I-1}\) \(f = \frac{\mbox{MST}}{\mbox{mse}}\) *
Error \(n-I\) \(\mbox{sse} = \sum_{i=1}^I s^2_i (n_i - 1)\) ** \(\mbox{mse} = \frac{\mbox{sse}}{n-I}\)
Total \(n-1\) \(\mbox{sst} + \mbox{sse}\)


  • *: \(f\) follows an F distribution with degrees of freedom equal to \(I-1\) and \(n-I\). \(I-1\) corresponds to the numerator and \(n-I\) the denominator.
  • **: This is the pooled variance for all \(I\) groups

Contrasts and Linear Combinations of Group Means

  • \(\gamma = \sum_{i=1}^I c_i \mu_i\)
  • Estimated with \(g = \sum_{i=1}^I c_i \bar{X}_i\)
  • If the coefficients sum to 0 then this linear combination is a contrast (if \(\sum_{i=1}^I c_i = 0\))
  • \(\mbox{SE}(g) = s_p \sqrt{\sum_{i=1}^I \frac{c_i^2}{n_i}}\)
  • Use \(s_p = \sqrt{\mbox{mse}}\) from the ANOVA table with the same degrees of freedom (\(n-1\)) This is because even if you’re comparing less than \(I\) groups in a contrast, those groups are still represented (just with 0 c coefficients)
  • Confidence Interval: \(g \pm t_{n-I}(1 - {\alpha \over 2}) \times \mbox{SE}(g)\)
  • Test Statistic: \(t = \frac{g - \gamma}{\mbox{SE}\left(g\right)}\)

Post-ANOVA Comparison Methods

  • Confidence intervals are constructed with \(\mbox{Estimate} \pm \mbox{Multiplier} \times \mbox{SE}\left(\mbox{Estimate}\right)\)
  • Since pairs are really a contrast with coefficients (1, -1) and 0 for all other groups, the \(\mbox{mse}\) from the ANOVA table is used for \(s_p^2\).
Method Multiplier Notes
Least Significant Difference (LSD) \(t_{n-I}(1 - {\alpha \over 2})\) No attempt to control the family wise error rate. If ANOVA is ran first, this is the f-protected LSD method
Tukey-Kramer \(\frac{q_{I,n-I,1-\alpha}}{\sqrt{2}}\) Used for all pairwise comparisons, more conservative than the above two methods, uses the Studentized Range distribution for the multiplier
Bonferroni \(t_{n-I}(1 - {\alpha \over 2k})\) Used for \(k\) pairwise comparisons
Scheffe \(\sqrt{(I-1)F_{I-1,n-I}(1 - \alpha)}\) Used for all contrasts, most conservative test

Statistical Models

  • Degrees of freedom for the model are the number of parameters in the model that vary.
  • A reduced model is a model with fewer parameters.
  • To determine lack of fit between two models:
    • \(H_0\): The reduced model adequately explains the data. \(H_a\): The full model is required to adequately explain the data.
    • Test statistic: \(F = \frac{\frac{\text{ss}(error)_{red} - \text{ss}(error)_{full}}{\text{df}(error)_{red} - \text{df}(error)_{full}}}{\frac{\text{ss}(error)_{full}}{\text{df}(error)_{full}}}\)
    • F follows an f distribution with degrees of freedom equal to \(\text{ss}(error)_{red}\) and \(\text{ss}(error)_{full}\)
    • If F is less than the significance level \(\alpha\), the full model is required to adequately explain the data.


Simple Linear Regression and least squares regression

  • \(Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i\)
  • Estimated with \(\hat{Y}_i = b_0 + b_1 X_i\)
  • Sums:
    • \(\mbox{ss}_{XY} = \sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})\)
    • \(ss_Y = \sum_{i=1}^n (Y_i - \bar{Y})^2\)
    • \(ss_X = \sum_{i=1}^n (X_i - \bar{X})^2\)
  • The line of best fit must pass through the point \((\bar{X}, \bar{Y})\)
  • The slope (\(\beta_1\)): \(\beta_1 = \frac{\mbox{ss}_{XY}}{\mbox{ss}_X}\)
  • The intercept (\(\beta_0\)): \(\beta_0 = \bar{Y} - b_1 \bar{X}\)
  • \(ss_e = \sum_{i=1}^n (Y_i - \hat{Y})^2\)
  • \(ss_r = \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2\) This is the sum of squares explained by the model.
  • \(ss_t = \sum_{i=1}^n (Y_i - \bar{Y})^2\) same as \(ss_Y\) above.
  • The Coefficient of determination \(R^2\)
    • Is the percentage of the sum of squares explained by the model to the sum of squares total
    • \(R^2 = \frac{ss_r}{ss_t}\)
    • It can also be rearranged as \(R^2 = 1 - \frac{ss_e}{ss_t}\)

Multiple Regression

  • \(Y_i = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_I X_I\)
  • Predicted with b as per usual. Note the lack of \(\varepsilon\)