Likelihood- vs Discriminant-based Classification
- Likelihood-based: $p(x|C_i)$ → $g_i(x)=logP(C_i|x)$
- Discriminant-based: $g_i(x|Φ_i)$
- Estimating the boundaries is enough
Linear Discriminant
- Linear discriminant: $g_i(x|w_i, w_{i0}) = w_i^Tx+w_{i0}$
- Advantages
- Simple: O(d)
- Knowledge extraction
- Optimal when $p(x|C_i)$ are Gaussian with shared cov matrix → almost lineary separable
Generalized Linear Model
- Quadratic discriminant
- Higher-order (product) terms: Map from x to z using nonlinear basis functions and use a linear discriminant in z-space
Two Classes
- $g(x) = g_1(x)-g_2(x)=w^Tx+w_0$
- Choose $C_1$ if $g(x)>0$ else $C_2$
Geometry
- hyperplane: $g(x)=0$
- $|g(x)|/||w||$
- $|w_0|/||w||$
Multiple Classes
- $g_i(x|w_i, w_{i0}) = w_i^Tx+w_{i0}$
- Choose $C_i$ if $g_i(x)=max g_j(x)$
- Classes are linearly separable