Idea: A sufficient statistic compresses data without losing information about the parameter
Definition
A statistic $U=U(X_1, ... ,X_n)$ is a sufficient statistic if conditional distribution of $X_1,...,X_n$ given $U$ does not depend on $\theta$.
Any 1-1 function of a sufficient statistic is a sufficient statistic.
Any statistic from which a sufficient statistic is calculated is also a sufficient statistic
$\exist$ Many possible SS’s ⇒ MSS (Minimal Sufficient Statistics)
Definition (Likelihood function)
$$ L(\theta|x_1,...,x_n) = f(x_1|\theta) \times ... \times f(x_n|\theta) $$
Likelihood = joint probability / density function of $X_1,...,X_n$, but different viewpoint!
Likelihood → focus on parameters / prob or density → focus on R.V.s
Factorization Theorem
$U=U(X_1, ..., X_n)$ is a sufficient statistic for $\theta$ iff we can write the likelihood in the form as
$$ L(\theta)=g(u(x_1,...,x_n), \theta) h(x_1,...,x_n) $$
Definition
$U=U(X_1,...,X_n)$ is a minimal sufficient statistic (MSS) if
(a) $U$ is a sufficient statistic and
(b) $U$ compresses the data at least as much as any other SS: If $V$ is any other SS, $U$ is a function of $V$.
Theorem by Lehmann-Scheffé
$$ \frac{L(x_1,...,x_n|\theta)}{L(y_1,...,y_n|\theta)} \neq h(\theta) \iff g(x_1,...,x_n)=g(y_1,...,y_n) $$
The above ratio is free of unknown parameter $\theta$ iff $g(x_1,...,x_n)=g(y_1,...,y_n)$. If such a function $g$ can be found, then $g(X_1,...,X_n)$ is a minimal sufficient statistic for $\theta$.
Definition (Consistency)
$\hat\theta_n$ based on $X_1,...,X_n$ is consistent for $\theta$ if $\hat\theta_n \overset{p}{\to} \theta$ as $n\rightarrow\infin$ for all values of $\theta$.
Tool 1: WLLN
$$ \bar X = \frac{1}{n} \sum^n_i X_i \overset{p}{\to} \mu=E(X_i) $$
Tool 2: Theorems on Limiting distributions
Suppose $W_n \overset{p}{\to} a$ and $V_n \overset{p}{\to} b$.
Tool 3:
If $\hat\theta_n$ is an UE of $\theta$ and $V(\hat\theta_n) \rightarrow 0$, $\hat\theta_n$ is consistent for $\theta$.
proof by Chevyshev inequality
Theorem (CLT)
$$ Z_n=\frac{\bar X_n-\mu}{\sigma/\sqrt n} \overset{D}{\to} N(0,1) $$
meaning that cdf of $Z_n$ converges to the cdf of $N(0,1)$
($\mu=E(X), \sigma=\sqrt{Var(X)}$)
Mapping Theorem
If $Y_n \overset{D}{\to} Y$, then $h(Y_n) \overset{D}{\to}h(Y)$ for any continuous function $h$.