TL DR: swap_memory will not allow you to work with pseudo-infinite sequences, but it will help you pick up larger (longer or wider or larger) sequences in memory.There is a separate trick for pseudo-infinite sequences, but it only applies to unidirectional RNNs .
swap_memory
During training, NN (including RNN) usually needs to store some activations in memory - they are necessary for calculating the gradient.
What swap_memory means is that it tells your RNN to store them in the host memory (CPU) instead of the device memory (GPU) and transfer them back to the GPU until they are needed.
Effectively, this allows you to pretend that your GPU has more memory than it actually is (due to the CPU memory, which is usually more numerous)
You still have to pay the computational cost of using very long sequences. Not to mention that you might run out of host memory.
To use it, simply set this argument to True .
sequence_length
Use this option if your sequences have different lengths. sequence_length has a misleading name - it is actually an array of the length of the sequence s .
You still need as much memory as you need if all your sequences are the same length ( max_time parameter)
tf.nn.bidirectional_dynamic_rnn
TF includes a ready-made implementation of bidirectional RNNs, so it would be easier to use this instead of a single one.
Wealthy RNN
To cope with very long sequences when training unidirectional RNNs, people have something else : they save the hidden final states for each batch, and use them as the initial hidden state for the next batch (for this work, the next batch should consist of the continuation of the sequences of previous batches )
These sections discuss how to do this in TF:
TensorFlow: memorize the LSTM state for the next batch (while maintaining the LSTM state)
How to set RNN TensorFlow state when state_is_tuple = True?