Perceptron
그림이 중요
$y = Wx$
Training
Online (instansces seen one by one) vs Batch (whole sample) learning
- No need to store the whole sample
- may change in time
Stochastic gradient-descent: Update after a single pattern
- mini-batch: 64, 128, 256, …
Training a perceptron: Regression
- $E^t(w|x^t, r^t)=\frac{1}{2}[r^t-(w^Tx^t)]^2$, $\Delta w^t_j = \eta (r^t-y^t)x^t_j$
Training a perceptron: Classification
Single
- $y^t = sigmoid(w^Tx^t)$
- Cross Entropy: $E^t(w|x^t, r^t) =-r^tlogy^t-(1-r^t)log(1-r^t)$
- $\Delta w^t_j = \eta (r^t-y^t)x^t_j$
K>2 softmax
- $y^t = softmax(w^T_ix^t)$
- $E^t(\{w_i\}_i|x^t, r^t) =-\sum_ir^t_ilogy^t_i$
Random order
- 데이터의 순서가 정렬된 경우 뒤쪽 데이터에 대한 학습의 영향이 커지기 때문에 학습 할 때 shuffle 필요
- 다만 시간 정렬의 경우, 순서를 유지하는게 좋음 (최신 데이터 반영)
Learning Boolean