Understanding the Seq2Seq Model

Here is my understanding of the basic sequence of LSTM Sequence to Sequence. Suppose we solve a question setup.

You have two sets of LSTMs (green and blue below). Each set, respectively, having a distribution weight (i.e., each of the 4 green cells has the same weights and is similar to blue cells). The first is a many-one LSTM, which summarizes the question in the last hidden level / cell memory .

The second set (blue) is many, many LSTMs that have different weights for the first set of LSTM. Input is just a sentence of an answer, while output is the same sentence shifted by one.

The question is two-fold: 1. Do we switch the last hidden states only to the blue LSTMs as the initial hidden state. Or is this the last hidden state and memory of the cell . 2. Is there a way to set the hidden initial state and cell memory in Keras or Tensorflow? If this is a link?

http://suriyadeepan.imtqy.com/img/seq2seq/seq2seq2.png (image taken from suriyadeepan.imtqy.com)

+7
tensorflow keras lstm
source share
2 answers
  • We pass the last hidden state only in blue LSTM as the initial hidden state. Or is this the last hidden state and memory of the cell.

The latent state h and the memory of cell c are transmitted to the decoder.

Tensorflow

In the source code of seq2seq , you can find the following code in basic_rnn_seq2seq() :

 _, enc_state = rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype) return rnn_decoder(decoder_inputs, enc_state, cell) 

If you use LSTMCell , the returned enc_state from the encoder will be a tuple (c, h) . As you can see, the tuple is passed directly to the decoder.

Keras

In Keras, the “state” defined for LSTMCell is also a tuple (h, c) (note that the order is different from TF). In LSTMCell.call() you can find:

  h_tm1 = states[0] c_tm1 = states[1] 

To get the states returned from the LSTM level, you can specify return_state=True . The return value is a tuple (o, h, c) . The tensor o is the result of this layer, which will be equal to h unless you specify return_sequences=True .

  1. Is there a way to set the initial hidden state and cell memory in Keras or Tensorflow? If this is a link?

Tensorflow

Just specify the initial state of LSTMCell when invoked. For example, in the official RNN tutorial :

 lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size) ... output, state = lstm(current_batch_of_words, state) 

There is also an initial_state argument for functions such as tf.nn.static_rnn . If you are using the seq2seq module, specify the status of rnn_decoder , as shown in the code for question 1.

Keras

Use the initial_state keyword argument in the LSTM function call.

 out = LSTM(32)(input_tensor, initial_state=(h, c)) 

In fact, you can find this use on the official documentation :

Note on setting the initial state of RNN

You can indicate the initial state of the RNN layers symbolically by calling them the keyword argument initial_state . The value of initial_state should be a tensor or a list of tensors representing the initial state of the RNN layer.


EDIT:

Now there is an example script in Keras ( lstm_seq2seq.py ) showing how to implement basic seq2seq in Keras. How to make a forecast after training, the seq2seq model is also discussed in this script.

+3
source share

(Edit: this answer is incomplete and does not take into account the actual state transfer capabilities. See the accepted answer).

From Keras point of view , this image has only two layers.

  • The green group is one LSTM layer.
  • The blue group is another LSTM layer.

There is no connection between green and blue, except for the transmission of outputs. So the answer for 1:

  • Only the thought vector (which is the actual output of the layer) is transferred to another level.

The memory and state (not sure if these are two different objects) are completely contained within one layer and are not originally intended for viewing or sharing with any other layer.

Each individual block in this image is completely invisible in keras. They are considered "time steps", which appears only as input. It is rarely important to worry about them (if only for very advanced uses).

In keras, it looks like this:

enter image description here

Easy, you only have access to external arrows (including the “thinking vector”).
But access to every step (every single green block in the picture) is not a revelation. So that...

  1. State transfer from one layer to another is also not expected in Keras. You probably have to hack things. (See So: https://github.com/fchollet/keras/issues/2995 )

But, given the thinking vector, large enough, you could say that he will learn to carry what is important in itself.

The only thing you have on the steps:

  • You must enter elements in the form (sentences, length, wordIdFeatures)

The steps will be carried out taking into account the fact that each length measurement slice is an input for each green block.

You can choose one output (sentences, cells) , for which you completely lose information about the steps. Or...

  • It is output as (sentences, length, cells) , from which you know the output of each block in size.

From one to many or many of many?

Now the first layer is many to one (but nothing prevents it from being many and many, if you want).

But the second ... complicated.

  • If the vector of thinking has been made multiple to one. You will need to control the way you create one for many. (This is not trivial in keras, but you could think about repeating the thought vector for the expected length, making it an input for all steps. Or, perhaps, fill the entire sequence with zeros or ones, keeping only the first element as a thought vector)
  • If the vector of thinking was made by many for many, you can take advantage of this and save easily many for many if you agree that the conclusion has exactly as many steps as the input.

Keras does not have a turnkey solution from 1 to many cases. (From one input predicts the whole sequence).

+1
source share

All Articles