πŸ”¬

Hypothesis Testing


The testing process

There are 3 steps, each made up of 3 steps:
  1. Frame
    1. State a null hypothesis
    2. State an alternative hypothesis
    3. Consider statistical assumptions (e.g. i.i.d. data)
  1. Stats
    1. State a test statistic
    2. Derive the distribution of , given the null hypothesis and assumptions
    3. State a significance level
  1. Test
    1. Compute the observed value of
    2. Calculate the p-value for
    3. Reject the null hypothesis for the alternative if the p-value <
Β 
where the p-value is:
"the probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed"
Β 
Note also that we talk about rejecting the null hypothesis in favour of the alternative hypothesis. If the test fails it is not correct to say that we accept the null hypothesis. This is a logical fallacy: just because we haven't yet proved it false, doesn't mean we have proved it true.

Selecting a test

What properties of the distribution are we testing?
  • Mean (location test)
  • Variance
  • Independence
  • Goodness of fit
What kind of test do we want to do for it?
  • One-sample: sample vs population
  • Two-sample: sample vs sample (independent of each other)
  • Paired tests: two samples with pairs across each (e.g. measurement of patient before and after treatment), typically test the difference between samples
What assumptions are we making?
  • Population distribution parameters: e.g. mean, variance

Z-test

When to use a z-test

Properties: mean
Kind of test: one-sample, two-sample, paired
Assumptions: normal distribution (or approx: large)

Statistic

Standard score (one sample):
where is either known, or estimated from the data.

Distribution

Normal: (standard presentation)

Notes

  • Estimating is only sensible if is large (rule of thumb: ). Otherwise we need to account for the uncertainty in the sample variance, typically using a t-test.
  • Can be used for two-sample test, using mean difference and sum of standard errors

T-test

Properties: mean
Kind of test: one-sample, two-sample, paired
Assumptions: means normally distributed (typically true based on CLT)

Statistic

t-statistic (one sample):
Same as standard score, only is always an estimate.

Distribution

Student's t-distribution, with degrees of freedom.

Notes:

  • Above formula is for one-sample test. Adjustments must be made for two-sample and paired test, but same principles hold.
  • With the two-sample approach, if we have (potentially) unequal variances, then it is sometimes called Welch's t-test.

Chi-squared test

Properties: variance, goodness of fit
Kind of test: one-sample
Assumptions: normally distributed

Statistic

Variance:
Goodness of fit:
Given a (contingency) table outlining the number of occurrences of each combination of two categorical variables (e.g. x=color, y=shape), we want to test if they are independent.

Distribution

Chi-squared, with or degrees of freedom.

F-test

Properties: variance, means of a set of distributions are all equal (ANOVA)
Kind of test: two-sample
Assumptions: normally distributed

Statistic

Variance:
ANOVA:

Distribution

F-distribution with and degrees of freedom (variance case)

Notes

  • In the ANOVA case:
    • it doesn't tell us which alternative is best
    • ANOVA case can be adjusted to be used in multiple linear regression to test for independence of variables
    • with we have where is the t-statistic

Binomial test

Properties: mean
Kind of test: one-sample
Assumptions: Binomial distribution

Statistic

Number of successes .

Distribution

Binomial - i.e. p value is , then compare to .

Notes

  • With >2 categories the multinomial test can be used in the same way
  • Generally speaking, any probability distribution can be used to form these simple tests

Fisher's exact test

Similar to chi-squared test, used for small contingency tables where we can calculate exact values.

A/B testing

This typically involves two-sample testing:

Β