Does swap_memory in dynamic_rnn allow quasi-infinite sequences?

I am trying to mark letters in long char sequences. Own data structure requires me to use a bi-directional approach.

In addition, based on this idea, I need access to a hidden state at every moment in time, and not just the last one.

To try the idea, I used a fixed-length approach. Currently, I am using batches of random fragments of 60 characters each of my longer sequences and run a bi-directional manual classifier with zero_state , which is initial_state for each 60 letter.

This worked fine, but obviously not perfect, since the sequences are actually longer, and the information left to the right of the part that I accidentally cut from the source is lost.

Now, to move forward, I want to work with all the sequences. They vary greatly in length, and I will not enter all the sequences (hereinafter) into the GPU.

I found the swap_memory parameter in the dynamic_rnn documentation . Will this help?

I did not find any additional documentation that helped me understand. And I can’t just try it myself easily, because I need access to hidden states at every moment of time, so I encoded the current chart without using any higher-level shells (such as dynamic_rnn). Trying this would require me to get all the intermediate states from the shell, which, as I understand it, is a big part of the implementation work.

Before moving on to this problem, I would like to be sure that this will really solve the problem with my memory. thanks for any tips!

+5
source share
1 answer

TL DR: swap_memory will not allow you to work with pseudo-infinite sequences, but it will help you pick up larger (longer or wider or larger) sequences in memory.There is a separate trick for pseudo-infinite sequences, but it only applies to unidirectional RNNs .


swap_memory

During training, NN (including RNN) usually needs to store some activations in memory - they are necessary for calculating the gradient.

What swap_memory means is that it tells your RNN to store them in the host memory (CPU) instead of the device memory (GPU) and transfer them back to the GPU until they are needed.

Effectively, this allows you to pretend that your GPU has more memory than it actually is (due to the CPU memory, which is usually more numerous)

You still have to pay the computational cost of using very long sequences. Not to mention that you might run out of host memory.

To use it, simply set this argument to True .


sequence_length

Use this option if your sequences have different lengths. sequence_length has a misleading name - it is actually an array of the length of the sequence s .

You still need as much memory as you need if all your sequences are the same length ( max_time parameter)


tf.nn.bidirectional_dynamic_rnn

TF includes a ready-made implementation of bidirectional RNNs, so it would be easier to use this instead of a single one.


Wealthy RNN

To cope with very long sequences when training unidirectional RNNs, people have something else : they save the hidden final states for each batch, and use them as the initial hidden state for the next batch (for this work, the next batch should consist of the continuation of the sequences of previous batches )

These sections discuss how to do this in TF:

TensorFlow: memorize the LSTM state for the next batch (while maintaining the LSTM state)

How to set RNN TensorFlow state when state_is_tuple = True?

+3
source

All Articles