Argument "Tensor Stream"

Question

Argument "Tensor Stream"

I am trying to understand the strides argument in tf.nn.avg_pool, tf.nn.max_pool, tf.nn.conv2d.

The documentation says:

strides: A list of integers with a length> = 4. The step of the sliding window for each dimension of the input tensor.

My questions:

What is each of the 4+ integers?
Why do they have steps [0] = strides [3] = 1 for convnets?
In this example, we see tf.reshape(_X,shape=[-1, 28, 28, 1]) . Why is -1?

Unfortunately, the examples in the docs for change using -1 don't translate too well into this scenario.

+83

python neural-network tensorflow convolution conv-neural-network

jfbeltran Jan 06 '16 at 20:56 on

source share

3 answers

The inputs are 4-dimensional and have the form: [batch_size, image_rows, image_cols, number_of_colors]

The steps generally define a match between the operations used. In the case of conv2d, the distance between successive applications of convolutional filters is indicated. A value of 1 in a specific dimension means that we apply the operator on each line / col, a value of 2 means every second, etc.

Re 1) The values that matter to convolutions are second and third, and they represent overlap when applying convolutional filters along rows and columns. The value [1, 2, 2, 1] says that we want to apply filters for every second row and column.

Re 2) I don’t know the technical limitations (maybe the CuDNN requirement), but usually people use steps for the sizes of rows or columns. It does not have to make sense to do this according to the size of the lot. Not sure about the last size.

Re 3) Setting -1 for one of the measuring instruments: "set the value for the first measurement so that the total number of elements in the tensor does not change." In our case, -1 will be equal to batch_size.

+13

Rafał Józefowicz Jan 06 '16 at 21:20

source share

Let's start by taking a step in the one-dimensional case.

Suppose your input = [1, 0, 2, 3, 0, 1, 1] and kernel = [2, 1, 3] result of the convolution [8, 11, 7, 9, 4] , which is calculated by shifting your kernels above the input, performing multiplication by elements and summing everything. Like this :

8 = 1 * 2 + 0 * 1 + 2 * 3
11 = 0 * 2 + 2 * 1 + 3 * 3
7 = 2 * 2 + 3 * 1 + 0 * 3
9 = 3 * 2 + 0 * 1 + 1 * 3
4 = 0 * 2 + 1 * 1 + 1 * 3

Here we move one element, but nothing stops you using any other number. This number is your step. You can think of it as reducing the results of a 1-step convolution by simply accepting every s-th result.

Knowing the input size i , the kernel size k , step s, and the complement p , you can easily calculate the output convolution size as:

Here || operator means ceiling operation. For the union layer, s = 1.

N-dimensional case.

Knowing the math for the 1-dimensional case, the n-dimensional case is simple if you see that each dim is independent. This way you simply move each size separately. Below is an example for the 2nd . Please note that you do not need to have the same step in all dimensions. Thus, for an N-dim input / kernel you must provide N steps.

So now it’s easy to answer all your questions:

What is each of the 4+ integers? . conv2d , pool tells you that this list is a step between each dimension. Note that the length of the list of steps is the same as the rank of the kernel tensor.
Why would they take steps [0] = strides 3 = 1 for convnets? , The first dimension is batch size, the last is channels. It makes no sense to skip either the packet or the channel. So you make them 1. For width / height you can skip something and why they may not be 1.
tf.reshape (_X, shape = [- 1, 28, 28, 1]). Why is -1? tf.reshape , it covers you:
If one component of the form is the special value -1, the size of this dimension is calculated so that the total size remains constant. In particular, the form [-1] aligns to 1-D. No more than one component of the form may be -1.

+4

Salvador Dali May 22 '17 at 12:32 a.m.

source share

dga · Accepted Answer · 2016-01-06 21:25

Combining and convolution operations shift the window through the input tensor. Using tf.nn.conv2d as an example: If the input tensor has 4 dimensions: [batch, height, width, channels] , then the convolution works in a 2D window in the dimensions height, width .

strides determines how many windows are shifted in each dimension. Typical use sets the first (batch) and last (depth) step to 1.

Let me use a very specific example: Launching the 2nd convolution above the input image with a gray scale of 32x32. I say shades of gray, because then the input image has a depth = 1, which helps keep it simple. Let this image look like this:

 00 01 02 03 04 ... 10 11 12 13 14 ... 20 21 22 23 24 ... 30 31 32 33 34 ... ...

Launch the 2x2 convolution window according to one example (batch size = 1). We will give convolution output channel depth 8.

The convolution input has shape=[1, 32, 32, 1] .

If you specify strides=[1,1,1,1] with padding=SAME , then the filter output will be [1, 32, 32, 8].

First, the filter will create an output for:

 F(00 01 10 11)

And then for:

 F(01 02 11 12)

etc. Then it will move to the second line, calculating:

 F(10, 11 20, 21)

then

 F(11, 12 21, 22)

If you specify the step [1, 2, 2, 1], it will not overlap the windows. He will calculate:

 F(00, 01 10, 11)

and then

 F(02, 03 12, 13)

The step works similarly for join operators.

Question 2: Why strides [1, x, y, 1] for convnets

The first one is the batch: you usually don't want to skip the examples in your batch or shouldn't have included them in the first place. :)

The last one is the convolution depth: for the same reason, you usually don't want to skip inputs.

The conv2d operator is more general, so you can create convolutions that move the window across other dimensions, but this is not a typical use in convnets. A typical use is to use them spatially.

Why redraw to -1 -1 is a placeholder that says "if necessary, adjust to fit the size needed for the full tensor." This is a way to make the code independent of the size of the input lot, so you can change your pipeline and not change the lot size in the code.

Argument "Tensor Stream"

Let's start by taking a step in the one-dimensional case.

N-dimensional case.

So now it’s easy to answer all your questions:

More articles: