How do you transfer video objects from CNN to LSTM?

Question

How do you transfer video objects from CNN to LSTM?

After you transfer the video frame through the pipeline and get a map of the output characteristics, how do you transfer this data to the LSTM? Also, how do you transfer multiple frames to LSTM via CNN?
In other works, I want to process video frames using CNN to get spatial functions. Then I want to pass these LSTM functions to temporarily process the spatial functions. How do I connect LSTM to video ads? For example, if the input video is 56x56, and then when it goes through all the CNN layers, they say that it comes out as 20: 5x5. How are they related to LSTM on a staff basis? ANd shoudl first pass through a fully connected layer? Thanks Jon

+5

video tensorflow lstm

Jon May 02, '16 at 22:00

source share

2 answers

Sung kim · Answer 1 · 2016-05-02T22:28:39+0000

Basically, you can smooth out each frame function and pass them into a single LSTM cell. With CNN, this is the same. You can feed each CNN pin to a single LSTM cell.

For FC, it is up to you.

See network structure from http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-180.pdf .

naaviii · Answer 2 · 2018-02-13T12:50:38+0000

The architecture of the cnn + lstm model will look like this: Basically you need to create a distributed distributed wrapper for the CNN layer, and then transfer the CNN output to the LSTM layer

cnn_input= Input(shape=(3,200,100,1)) #Frames,height,width,channel of imafe conv1 = TimeDistributed(Conv2D(32, kernel_size=(50,5), activation='relu'))(cnn_input) conv2 = TimeDistributed(Conv2D(32, kernel_size=(20,5), activation='relu'))(conv1) pool1=TimeDistributed(MaxPooling2D(pool_size=(4,4)))(conv2) flat=TimeDistributed(Flatten())(pool1) cnn_op= TimeDistributed(Dense(100))(flat)

After that you can pass your CNN output to LSTM

 lstm = LSTM(128, return_sequences=True, activation='tanh')(merged) op =TimeDistributed(Dense(100))(lstm) fun_model = Model(inputs=[cnn_input], outputs=op)

remember that the entrance was distributed at this time. CNN should be (# of frames, row_size, column_size, channels)

Finally, you can apply softmax at the last level to get some predictions.

How do you transfer video objects from CNN to LSTM?

More articles: