$p$-value

probability of a result at least as extreme (in the direction given by $H_a$) as the result we actually got, assuming $H_0$ to be true.

$p$-value measures how compatible with $H_0$ the data are
smaller $p$-values are stronger evidences against $H_0$ in favor of $H_a$

Test in practice

$H_a$ claim that some effect or difference is present in a population
$H_0$ as “no effect” or “no difference” and seek evidence against $H_0$
Test statistics measures how far data depart from what would be expected if $H_0$ were true
- ex. $\bar X_1 - \bar X_2$, $\bar X - L$
Good test statistic → small probability of making errors

Types of errors

Definition

Type 1 error: reject $H_0$ when $H_0$ is really true Type 2 error: accept $H_0$ when $H_a$ is really true

$\alpha = P(\text{Type 1 error}) = P(\text{reject } H_0|H_0 \text{ is true})$
$\beta = P(\text{Type 2 error}) = P(\text{accept } H_0|H_a \text{ is true})$

Max $\beta$ will be $1-\alpha$, so choosing a rule with small $\alpha$ implies $\beta$ is large for $\mu$ close to the boundary. If we would like both small, we could take a larger sample. Larger samples reduce $\beta$ for given $\alpha$ at any fixed $\mu$ in $H_a$.

example

Power of tests

Definition

Power of a test, denoted by power($\theta$), is the probability that the test rejects $H_0$ when the true parameter value is $\theta$.

If $\theta_0 \in H_0$, power$(\theta_0)=P(\text{reject }H_0|H_0 \text{ is true})= \alpha$