Hypothesis Testing

About

(This page is based primarily on material from the Wikipedia page, Statistical Hypothesis Testing)

The testing process

There are 3 steps, each made up of 3 steps:

Frame

State a null hypothesis
State an alternative hypothesis
Consider statistical assumptions (e.g. i.i.d. data)

Stats

State a test statistic
Derive the distribution of , given the null hypothesis and assumptions
State a significance level

Test

Compute the observed value of
Calculate the p-value for
Reject the null hypothesis for the alternative if the p-value <

where the p-value is:

"the probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed"

Note also that we talk about rejecting the null hypothesis in favour of the alternative hypothesis. If the test fails it is not correct to say that we accept the null hypothesis. This is a logical fallacy: just because we haven't yet proved it false, doesn't mean we have proved it true.

Selecting a test

What properties of the distribution are we testing?

Mean (location test)

Variance

Independence

Goodness of fit

What kind of test do we want to do for it?

One-sample: sample vs population

Two-sample: sample vs sample (independent of each other)

Paired tests: two samples with pairs across each (e.g. measurement of patient before and after treatment), typically test the difference between samples

What assumptions are we making?

Population distribution parameters: e.g. mean, variance

Z-test

When to use a z-test

Properties: mean

Kind of test: one-sample, two-sample, paired

Assumptions: normal distribution (or approx: large)

Statistic

Standard score (one sample):

where is either known, or estimated from the data.

Distribution

Normal: (standard presentation)

Notes

Estimating is only sensible if is large (rule of thumb: ). Otherwise we need to account for the uncertainty in the sample variance, typically using a t-test.

Can be used for two-sample test, using mean difference and sum of standard errors

T-test

Properties: mean

Kind of test: one-sample, two-sample, paired

Assumptions: means normally distributed (typically true based on CLT)

Statistic

t-statistic (one sample):

Same as standard score, only is always an estimate.

Distribution

Student's t-distribution, with degrees of freedom.

Notes:

Above formula is for one-sample test. Adjustments must be made for two-sample and paired test, but same principles hold.

With the two-sample approach, if we have (potentially) unequal variances, then it is sometimes called Welch's t-test.

Chi-squared test

Properties: variance, goodness of fit

Kind of test: one-sample

Assumptions: normally distributed

Statistic

Variance:

Goodness of fit:

Given a (contingency) table outlining the number of occurrences of each combination of two categorical variables (e.g. x=color, y=shape), we want to test if they are independent.

Distribution

Chi-squared, with or degrees of freedom.

F-test

Properties: variance, means of a set of distributions are all equal (ANOVA)

Kind of test: two-sample

Assumptions: normally distributed

Statistic

Variance:

ANOVA:

Distribution

F-distribution with and degrees of freedom (variance case)

Notes

In the ANOVA case:

it doesn't tell us which alternative is best
ANOVA case can be adjusted to be used in multiple linear regression to test for independence of variables
with we have where is the t-statistic

Binomial test

Properties: mean

Kind of test: one-sample

Assumptions: Binomial distribution

Statistic

Number of successes .

Distribution

Binomial - i.e. p value is , then compare to .

Notes

With >2 categories the multinomial test can be used in the same way

Generally speaking, any probability distribution can be used to form these simple tests

Fisher's exact test

Similar to chi-squared test, used for small contingency tables where we can calculate exact values.

A/B testing

This typically involves two-sample testing:

Hypothesis Testing

Contents

About

The testing process

Selecting a test

Z-test

When to use a z-test

Statistic

Distribution

Notes

T-test

Statistic

Distribution

Notes:

Chi-squared test

Statistic

Distribution

F-test

Statistic

Distribution

Notes

Binomial test

Statistic

Distribution

Notes

Fisher's exact test

A/B testing