About
(This page is based primarily on material from the Wikipedia page, Statistical Hypothesis Testing)
Β
The testing process
There are 3 steps, each made up of 3 steps:
- Frame
- State a null hypothesis
- State an alternative hypothesis
- Consider statistical assumptions (e.g. i.i.d. data)
- Stats
- State a test statistic
- Derive the distribution of , given the null hypothesis and assumptions
- State a significance level
- Test
- Compute the observed value of
- Calculate the p-value for
- Reject the null hypothesis for the alternative if the p-value <
Β
where the p-value is:
"the probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed"
Β
Note also that we talk about rejecting the null hypothesis in favour of the alternative hypothesis. If the test fails it is not correct to say that we accept the null hypothesis. This is a logical fallacy: just because we haven't yet proved it false, doesn't mean we have proved it true.
Selecting a test
What properties of the distribution are we testing?
- Mean (location test)
- Variance
- Independence
- Goodness of fit
What kind of test do we want to do for it?
- One-sample: sample vs population
- Two-sample: sample vs sample (independent of each other)
- Paired tests: two samples with pairs across each (e.g. measurement of patient before and after treatment), typically test the difference between samples
What assumptions are we making?
- Population distribution parameters: e.g. mean, variance
Z-test
When to use a z-test
Properties: mean
Kind of test: one-sample, two-sample, paired
Assumptions: normal distribution (or approx: large)
Statistic
Standard score (one sample):
where is either known, or estimated from the data.
Distribution
Normal: (standard presentation)
Notes
- Estimating is only sensible if is large (rule of thumb: ). Otherwise we need to account for the uncertainty in the sample variance, typically using a t-test.
- Can be used for two-sample test, using mean difference and sum of standard errors
T-test
Properties: mean
Kind of test: one-sample, two-sample, paired
Assumptions: means normally distributed (typically true based on CLT)
Statistic
t-statistic (one sample):
Same as standard score, only is always an estimate.
Distribution
Student's t-distribution, with degrees of freedom.
Notes:
- Above formula is for one-sample test. Adjustments must be made for two-sample and paired test, but same principles hold.
- With the two-sample approach, if we have (potentially) unequal variances, then it is sometimes called Welch's t-test.
Chi-squared test
Properties: variance, goodness of fit
Kind of test: one-sample
Assumptions: normally distributed
Statistic
Variance:
Goodness of fit:
Given a (contingency) table outlining the number of occurrences of each combination of two categorical variables (e.g. x=color, y=shape), we want to test if they are independent.
Distribution
Chi-squared, with or degrees of freedom.
F-test
Properties: variance, means of a set of distributions are all equal (ANOVA)
Kind of test: two-sample
Assumptions: normally distributed
Statistic
Variance:
ANOVA:
Distribution
F-distribution with and degrees of freedom (variance case)
Notes
- In the ANOVA case:
- it doesn't tell us which alternative is best
- ANOVA case can be adjusted to be used in multiple linear regression to test for independence of variables
- with we have where is the t-statistic
Binomial test
Properties: mean
Kind of test: one-sample
Assumptions: Binomial distribution
Statistic
Number of successes .
Distribution
Binomial - i.e. p value is , then compare to .
Notes
- With >2 categories the multinomial test can be used in the same way
- Generally speaking, any probability distribution can be used to form these simple tests
Fisher's exact test
Similar to chi-squared test, used for small contingency tables where we can calculate exact values.
A/B testing
This typically involves two-sample testing:
Β