Paper

Sequence to Sequence Learning with Neural Networks

Sequence to Sequence Learning with Neural Networks

RNN

Given a sequence of inputs $(x_1,...x_T)$, a standard RNN computes a sequence of outputs $(y_1,...,y_T)$ by iterating the following equation:

$$ h_t=\text{sigm}(W^{hx}x_t+W^{hh}h_{t-1}) \\ y_t=W^{yh}h_t $$

Easily map sequences to sequences whenever the alignment is known ahead of time.
But, it is difficult how to apply an RNN to complex problems whose input and output sequences have different lengths
To solve this problem, we can map input sequences to a fixed-sized vector. But it would be difficult to train the RNNs due to the resulting long term dependencies.

LSTM

LSTM is known to learn problems with long range temporal dependencies, so an LSTM may succeed in this setting.

The goal of the LSTM is to estimate the conditional probability $p(y_1,...,y_{T'}|x_1,...,x_T)$ where $(x_1,...,x_T)$ is an input sequence and $(y_1,...y_{T'})$ is its corresponding output sequence whose length $T'$ may differ from $T$.

생략

The model

The researchers’ models differ from above description in three important ways.

Use two different LSTMs: for the input sequence and for the output sequence → negligible increase of model parameters, but makes it natural to train the LSTM on multiple language pairs simultaneously
Choose an LSTM with four layers because they found that deep LSTMs significantly outperformed shallow LSTMs.
It is extremely valuable to reverse the order of the words of the input sentences.

Seq2Seq with RNNs

Untitled