$p$-value
probability of a result at least as extreme (in the direction given by $H_a$) as the result we actually got, assuming $H_0$ to be true.
- $p$-value measures how compatible with $H_0$ the data are
- smaller $p$-values are stronger evidences against $H_0$ in favor of $H_a$
Test in practice
- $H_a$ claim that some effect or difference is present in a population
- $H_0$ as “no effect” or “no difference” and seek evidence against $H_0$
- Test statistics measures how far data depart from what would be expected if $H_0$ were true
- ex. $\bar X_1 - \bar X_2$, $\bar X - L$
- Good test statistic → small probability of making errors
Types of errors
Definition
Type 1 error: reject $H_0$ when $H_0$ is really true
Type 2 error: accept $H_0$ when $H_a$ is really true
- $\alpha = P(\text{Type 1 error}) = P(\text{reject } H_0|H_0 \text{ is true})$
- $\beta = P(\text{Type 2 error}) = P(\text{accept } H_0|H_a \text{ is true})$
Max $\beta$ will be $1-\alpha$, so choosing a rule with small $\alpha$ implies $\beta$ is large for $\mu$ close to the boundary. If we would like both small, we could take a larger sample. Larger samples reduce $\beta$ for given $\alpha$ at any fixed $\mu$ in $H_a$.
Power of tests
Definition
Power of a test, denoted by power($\theta$), is the probability that the test rejects $H_0$ when the true parameter value is $\theta$.
- If $\theta_0 \in H_0$, power$(\theta_0)=P(\text{reject }H_0|H_0 \text{ is true})= \alpha$