cs231n Lecture 12 slides
Unsupervised approach for learning a lower-dimensional feature representation from unlabeled training data
$\mathbf z$ usually smaller than $\mathbf x$ (dimensionality reduction)
Why? we want features to capture meaningful factors of variation in data
How to learn this feature representation?
L2 Loss function $||x-\hat x||^2$ → doesn’t use labels
After training, throw away decoder