Random Variable
measurable function $X: \Omega \rightarrow E$ from a set of possible outcomes $\Omega$ to a measurable space $E$
Joint distribution
Event
Marginal distribution
Conditional probability
Probabilistic inference
Product rule
$P(x|y)=\frac{P(x,y)}{P(y)} \iff P(x,y)=P(x|y)P(y)$
Chain rule
$P(x_1,x_2,...,x_n)=P(x_1)P(x_2|x_1)...P(x_n|x_{n-1},...,x_1)=\prod_iP(x_i|x_{<i})$
Bayes Theorem
$P(A|B)=\frac{P(B|A)P(A)}{P(B)}$
Bayes’ Terminology
$P(e|D)=\frac{P(D|e)}{P(e)}$
$P(e)$ is called the prior probability of $e$. Its what we know about $e$ with no other evidence.
$P(D|e)$ is the conditional probability of $D$ given that $e$ happened, or just the likelihood of $D$.
$P(e|D)$ is the posterior probability of $e$ given $D$. It’s the answer we want, or the way we choose a best answer.
Maximum Likelihood Estimation (MLE)
$\theta_{MLE}=\argmax_\theta P(X|\theta)$
Maximum A Posteriori (MAP)
$\theta_{MAP}=\argmax_\theta P(\theta|X)=\argmax_\theta P(X|\theta)P(\theta)$
$=\argmax_\theta\log P(X|\theta) + \log P(\theta)$
Conditionally independent
For $X⫫Y|Z$,
$\forall x,y,z : P(x,y|z)=P(x|z)P(y|z)$
$\forall x,y,z:P(x|z,y)=P(x|z)$
Naive Bayes Classifier
Key Assumption: the features are generated independently given $c_j$.
$P(X,c_j)=P(X_1,...,X_k|c_j)P(c_j)=P(c_j)\prod^k_{i=1}P(X_i|c_j)$
We want to figure out the most likely class:
$c=\argmax_{c_j}P(c_j|X_1,...X_k)$