Strange attention weights when trying to learn the inverse sequence with seq2seq

Question

Strange attention weights when trying to learn the inverse sequence with seq2seq

Inspired by the TensorFlow textum ( tuto ) tutorial ( source ), I wanted it to work in detail.

My plan is to implement the models (as simple as possible) and train them in toys to understand what is going on in detail.

Tasks

I tried two simple tasks: set a sequence of integers [s_0, ..., s_n] ...

..., return [s_0+1, ..., s_n+1] (add one)
..., return [s_n, ..., s_0] (reverse)

Model

Like tuto, I use LSTM Bi-RNN as an encoder and LSTM with attention to the decoder.

I took the code from: https://github.com/tensorflow/models/blob/master/textsum/seq2seq_attention_model.py#L149

results

These tasks are very easy for any seq2seq model. In fact, he was able to study it very effectively (errors are very rare).

About attention

It can be argued that the attention mechanism is not mandatory here. I agree. Nevertheless, thinking about this, I wanted to check. I thought in the case of reverse something like a diagonal matrix, something like ( img src ):

Which will indicate that the first element corresponds to the last, 1 with n-1, etc.

So, I have changed my code, as in https://stackoverflow.com/a/212960/ ... I have my own tf.contrib.legacy_seq2seq.attention_decoder

from:

 def attention(query, return_attn=False): """Put attention masks on hidden using hidden_features and query.""" ds = [] # Results of attention reads will be stored here. [...] # Attention mask is a softmax of v^T * tanh(...). s = math_ops.reduce_sum(v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3]) a = nn_ops.softmax(s) # Now calculate the attention-weighted vector d. d = math_ops.reduce_sum( array_ops.reshape(a, [-1, attn_length, 1, 1]) * hidden, [1, 2]) ds.append(array_ops.reshape(d, [-1, attn_size])) if not return_attn: return ds return ds, a

and immediately after:

  for i, inp in enumerate(decoder_inputs): [...] attns, a = attention(state, True) attns_list.append(a) [...] if return_attn: return outputs, state, attns_list return outputs, state

and finally in my model:

 decoder_outputs, self._dec_out_state, a = custom_seq2seq.attention_decoder( decoder_inputs=emb_decoder_inputs, initial_state=self._dec_in_state, attention_states=self._enc_top_states, cell=cell, num_heads=1, loop_function=loop_function, initial_state_attention=initial_state_attention, return_attn=True)

That when I sketch this gives:

We are far from a simple diagonal, hu?

0

python machine-learning tensorflow attention-model

pltrdy Mar 30 '17 at 16:11

source share

No one has answered this question yet.

See similar questions:

8

Visualization of activation of attention in Tensorflow

or similar:

7

Should RNN scales be reassigned over variable length sequences to “mask” zero-fill effects?

3

Initialization of decoder states in sequences to sequences

3

How to change the Tensorflow Sequence2Sequence model to implement bidirectional LSTM rather than unidirectional?

2

Attention mechanism for the classification of sequences (seq2seq tensorflow r1.1)

0

Python LSTM implementation from scratch

0

Is TensorFlow seq2seq an implementation of Baghdanau’s initial attention assessment?

0

TensorFlow: printing the internal state of the RNN at every step

0

How to connect multilayer bidirectional states of an LSTM encoder to a decoder?

0

Seq2Seq Attention Mechanism

0

seq2seq. Attention looking at encoder states bypasses the last hidden encoding state

Strange attention weights when trying to learn the inverse sequence with seq2seq

Tasks

Model

results

About attention

More articles: