Inspired by the TensorFlow textum ( tuto ) tutorial ( source ), I wanted it to work in detail.
My plan is to implement the models (as simple as possible) and train them in toys to understand what is going on in detail.
Tasks
I tried two simple tasks: set a sequence of integers [s_0, ..., s_n] ...
- ..., return
[s_0+1, ..., s_n+1] (add one) - ..., return
[s_n, ..., s_0] (reverse)
Model
Like tuto, I use LSTM Bi-RNN as an encoder and LSTM with attention to the decoder.
I took the code from: https://github.com/tensorflow/models/blob/master/textsum/seq2seq_attention_model.py#L149
results
These tasks are very easy for any seq2seq model. In fact, he was able to study it very effectively (errors are very rare).
About attention
It can be argued that the attention mechanism is not mandatory here. I agree. Nevertheless, thinking about this, I wanted to check. I thought in the case of reverse something like a diagonal matrix, something like ( img src ):

Which will indicate that the first element corresponds to the last, 1 with n-1, etc.
So, I have changed my code, as in https://stackoverflow.com/a/212960/ ... I have my own tf.contrib.legacy_seq2seq.attention_decoder
from:
def attention(query, return_attn=False): """Put attention masks on hidden using hidden_features and query.""" ds = [] # Results of attention reads will be stored here. [...] # Attention mask is a softmax of v^T * tanh(...). s = math_ops.reduce_sum(v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3]) a = nn_ops.softmax(s) # Now calculate the attention-weighted vector d. d = math_ops.reduce_sum( array_ops.reshape(a, [-1, attn_length, 1, 1]) * hidden, [1, 2]) ds.append(array_ops.reshape(d, [-1, attn_size])) if not return_attn: return ds return ds, a
and immediately after:
for i, inp in enumerate(decoder_inputs): [...] attns, a = attention(state, True) attns_list.append(a) [...] if return_attn: return outputs, state, attns_list return outputs, state
and finally in my model:
decoder_outputs, self._dec_out_state, a = custom_seq2seq.attention_decoder( decoder_inputs=emb_decoder_inputs, initial_state=self._dec_in_state, attention_states=self._enc_top_states, cell=cell, num_heads=1, loop_function=loop_function, initial_state_attention=initial_state_attention, return_attn=True)
That when I sketch this gives:

We are far from a simple diagonal, hu?