The right choice of specification pieces for the dask array

Question

The right choice of specification pieces for the dask array

According to dask documentaion, you can specify pieces in one of three ways:

block size e.g. 1000
block form (1000, 1000)
explicit sizes of all blocks by all sizes, for example ((1000, 1000, 500), (400, 400))
The input of your fragments will be normalized and saved in the third and most explicit form.

Trying to figure out how the pieces work with the visualize () function, there are a few more things that I'm not sure about:

If the input is normalized, does it matter which input form to choose?

Lock means that each piece has size X, i.e. 1000. What does blockchape input indicate?

When entering blockchape input, is the order of the parameters performed? How does this relate to the shape of the array / matrix?

+7

python dask

istern Jan 20 '16 at 9:14

source share

1 answer

Mocklin · Accepted Answer · 2016-01-20T15:46:22+0000

The lower forms on this list are more explicit and allow you to increase the asymmetry in your block forms.

Examples

We will discuss this using a sequence of chunks examples in the following array:

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

Let's show how different chunks arguments break an array into different blocks

`chunks=3`

Symmetric size 3 blocks

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

`chunks=2`

Symmetric size 2 blocks

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

`chunks=(3, 2)`

Asymmetric, but repeating size blocks (3, 2)

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

`chunks=(1, 6)`

Asymmetric, but repeating size blocks (1, 6)

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

`chunks=((2, 4), (3, 3))`

Asymmetric and non-repeating blocks

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

`chunks=((2, 2, 1, 1), (3, 2, 1))`

Asymmetric and non-repeating blocks

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6

Discussion

The latest examples are rarely provided by users from the source data, but arise from complex slicing and broadcasting operations. I usually use the simplest form until I need more complex forms. The choice of pieces should match the calculations you want to do.

For example, if you plan to take out thin slices along the first dimension, you might want to make this dimension narrower than others. If you plan to do linear algebra, then you may need more symmetrical blocks.

The right choice of specification pieces for the dask array

Examples

chunks=3

chunks=2

chunks=(3, 2)

chunks=(1, 6)

chunks=((2, 4), (3, 3))

chunks=((2, 2, 1, 1), (3, 2, 1))

Discussion

More articles:

`chunks=3`

`chunks=2`

`chunks=(3, 2)`

`chunks=(1, 6)`

`chunks=((2, 4), (3, 3))`

`chunks=((2, 2, 1, 1), (3, 2, 1))`