### About

## (This page is based primarily on material from the Wikipedia page, *Statistical Hypothesis Testing*)

Β

### The testing process

There are 3 steps, each made up of 3 steps:

- Frame
- State a
**null hypothesis** - State an
**alternative hypothesis** - Consider statistical
**assumptions**(e.g. i.i.d. data)

- Stats
- State a
**test statistic** - Derive the
**distribution**of , given the null hypothesis and assumptions - State a
**significance level**

- Test
- Compute the
**observed value**of - Calculate the
**p-value** **Reject**the null hypothesis for the alternative if the p-value <

Β

where the

**p-value**is:"the probability, under the null hypothesis, of sampling a test statistic at least as extreme as that which was observed"

Β

Note also that we talk about rejecting the null hypothesis in favour of the alternative hypothesis. If the test fails it is not correct to say that we

*accept the null hypothesis*. This is a logical fallacy: just because we haven't yet proved it false, doesn't mean we have proved it true.### Selecting a test

What properties of the distribution are we testing?

- Mean (location test)

- Variance

- Independence

- Goodness of fit

What kind of test do we want to do for it?

**One-sample:**sample vs population

**Two-sample:**sample vs sample (independent of each other)

**Paired tests:**two samples with pairs across each (e.g. measurement of patient before and after treatment), typically test the difference between samples

What assumptions are we making?

- Population distribution parameters: e.g. mean, variance

### Z-test

#### When to use a z-test

**Properties:**mean

**Kind of test:**one-sample, two-sample, paired

**Assumptions:**normal distribution (or approx: large)

#### Statistic

Standard score (one sample):

where is either known, or estimated from the data.

#### Distribution

Normal: (standard presentation)

#### Notes

- Estimating is only sensible if is large (rule of thumb: ). Otherwise we need to account for the uncertainty in the sample variance, typically using a
*t-test*.

- Can be used for two-sample test, using mean difference and sum of standard errors

### T-test

**Properties:**mean

**Kind of test:**one-sample, two-sample, paired

**Assumptions:**

*means*normally distributed (typically true based on CLT)

#### Statistic

t-statistic (one sample):

Same as standard score, only is always an estimate.

#### Distribution

Student's t-distribution, with degrees of freedom.

#### Notes:

- Above formula is for one-sample test. Adjustments must be made for two-sample and paired test, but same principles hold.

- With the two-sample approach, if we have (potentially) unequal variances, then it is sometimes called
*Welch's t-test*.

### Chi-squared test

**Properties:**variance, goodness of fit

**Kind of test:**one-sample

**Assumptions:**normally distributed

#### Statistic

Variance:

Goodness of fit:

Given a (contingency) table outlining the number of occurrences of each combination of two categorical variables (e.g. x=color, y=shape), we want to test if they are independent.

#### Distribution

Chi-squared, with or degrees of freedom.

### F-test

**Properties:**variance, means of a

*set*of distributions are all equal (ANOVA)

**Kind of test:**two-sample

**Assumptions:**normally distributed

#### Statistic

Variance:

ANOVA:

#### Distribution

F-distribution with and degrees of freedom (variance case)

#### Notes

- In the ANOVA case:
- it doesn't tell us
*which*alternative is best - ANOVA case can be adjusted to be used in multiple linear regression to test for independence of variables
- with we have where is the t-statistic

### Binomial test

**Properties:**mean

**Kind of test:**one-sample

**Assumptions:**Binomial distribution

#### Statistic

Number of successes .

#### Distribution

Binomial - i.e. p value is , then compare to .

#### Notes

- With >2 categories the multinomial test can be used in the same way

- Generally speaking, any probability distribution can be used to form these simple tests

### Fisher's exact test

Similar to chi-squared test, used for small contingency tables where we can calculate exact values.

### A/B testing

This typically involves two-sample testing:

Β