Lec11 Multilayer Perceptron | Notion

Perceptron

그림이 중요

그림이 중요

$y = Wx$

Training

Online (instansces seen one by one) vs Batch (whole sample) learning

No need to store the whole sample
may change in time

Stochastic gradient-descent: Update after a single pattern

mini-batch: 64, 128, 256, …

Training a perceptron: Regression

$E^t(w|x^t, r^t)=\frac{1}{2}[r^t-(w^Tx^t)]^2$, $\Delta w^t_j = \eta (r^t-y^t)x^t_j$

Training a perceptron: Classification

Single

$y^t = sigmoid(w^Tx^t)$
Cross Entropy: $E^t(w|x^t, r^t) =-r^tlogy^t-(1-r^t)log(1-r^t)$
$\Delta w^t_j = \eta (r^t-y^t)x^t_j$

K>2 softmax

$y^t = softmax(w^T_ix^t)$
$E^t(\{w_i\}_i|x^t, r^t) =-\sum_ir^t_ilogy^t_i$

Random order

데이터의 순서가 정렬된 경우 뒤쪽 데이터에 대한 학습의 영향이 커지기 때문에 학습 할 때 shuffle 필요
다만 시간 정렬의 경우, 순서를 유지하는게 좋음 (최신 데이터 반영)

Learning Boolean