What is the difference between two RNN implementations in a tensor stream?

I find two kinds of RNN implementations in tensor flow.

The first implementations of this (from line 124 - 129). It uses a loop to determine each input step in the RNN.

with tf.variable_scope("RNN"): for time_step in range(num_steps): if time_step > 0: tf.get_variable_scope().reuse_variables() (cell_output, state) = cell(inputs[:, time_step, :], state) outputs.append(cell_output) states.append(state) 

Second implementation of this (lines 51 to 70). It does not use any loop to determine each input step in the RNN.

 def RNN(_X, _istate, _weights, _biases): # input shape: (batch_size, n_steps, n_input) _X = tf.transpose(_X, [1, 0, 2]) # permute n_steps and batch_size # Reshape to prepare input to hidden activation _X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input) # Linear activation _X = tf.matmul(_X, _weights['hidden']) + _biases['hidden'] # Define a lstm cell with tensorflow lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0) # Split data because rnn cell needs a list of inputs for the RNN inner loop _X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden) # Get lstm cell output outputs, states = rnn.rnn(lstm_cell, _X, initial_state=_istate) # Linear activation # Get inner loop last output return tf.matmul(outputs[-1], _weights['out']) + _biases['out'] 



In the first implementation, I find there is no weight matrix between the input module and the hidden block , just define the weight matrix between the hidden block and the output (from line 132 to 133) ..

 output = tf.reshape(tf.concat(1, outputs), [-1, size]) softmax_w = tf.get_variable("softmax_w", [size, vocab_size]) softmax_b = tf.get_variable("softmax_b", [vocab_size]) logits = tf.matmul(output, softmax_w) + softmax_b 

But in the second implementation, both weight matrices are defined (from line 42 - 47).

 weights = { 'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights 'out': tf.Variable(tf.random_normal([n_hidden, n_classes])) } biases = { 'hidden': tf.Variable(tf.random_normal([n_hidden])), 'out': tf.Variable(tf.random_normal([n_classes])) } 

I wonder why?

+5
source share
1 answer

The difference that I noticed is that the code in the second implementation uses tf.nn.rnn, which takes a list of inputs for each time step and generates a list of outputs for each time step.

(Inputs: a list of T values, each of which has a form tensor [batch_size, input_size].)

So, if you check the code in the second implementation on line 62, the input will be generated in n_steps * (batch_size, n_hidden)

 # Split data because rnn cell needs a list of inputs for the RNN inner loop _X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden) 

In the 1st implementation, they iterate through n_time_steps and provide input data and receive the corresponding output and save it in the output list.

Code snippet from line 113 to 117

 outputs = [] state = self._initial_state with tf.variable_scope("RNN"): for time_step in range(num_steps): if time_step > 0: tf.get_variable_scope().reuse_variables() (cell_output, state) = cell(inputs[:, time_step, :], state) outputs.append(cell_output) 

Turning to your second question:

If you carefully noticed the input method in RNN in both implementations.

In the first implementation, the inputs already have the form batch_size x num_steps (here num_steps is the hidden size):

 self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps]) 

While in the second implementation, the original inputs are of the form (batch_size x n_steps x n_input). Therefore, to convert to a form, a weight matrix is ​​required (n_steps x batch_size x hidden_size):

  # Input shape: (batch_size, n_steps, n_input) _X = tf.transpose(_X, [1, 0, 2]) # Permute n_steps and batch_size # Reshape to prepare input to hidden activation _X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input) # Linear activation _X = tf.matmul(_X, _weights['hidden']) + _biases['hidden'] # Split data because rnn cell needs a list of inputs for the RNN inner loop _X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden) 

Hope this is helpful ...

+2
source

All Articles