Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual...
Goal: Single raw image → caption $\mathbf y$ encoded as a sequence of 1-of-K encoded words
$y=\{\mathbf y_1,...,\mathbf y_C\}, \mathbf y_i\in\R^K$
K: vocabulary size, C: length of the caption
Repeat…