The right choice of specification pieces for the dask array

According to dask documentaion, you can specify pieces in one of three ways:

  • block size e.g. 1000
  • block form (1000, 1000)
  • explicit sizes of all blocks by all sizes, for example ((1000, 1000, 500), (400, 400))

The input of your fragments will be normalized and saved in the third and most explicit form.

Trying to figure out how the pieces work with the visualize () function, there are a few more things that I'm not sure about:

If the input is normalized, does it matter which input form to choose?

Lock means that each piece has size X, i.e. 1000. What does blockchape input indicate?

When entering blockchape input, is the order of the parameters performed? How does this relate to the shape of the array / matrix?

+7
python dask
source share
1 answer

The lower forms on this list are more explicit and allow you to increase the asymmetry in your block forms.

Examples

We will discuss this using a sequence of chunks examples in the following array:

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 

Let's show how different chunks arguments break an array into different blocks

chunks=3

Symmetric size 3 blocks

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 

chunks=2

Symmetric size 2 blocks

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 

chunks=(3, 2)

Asymmetric, but repeating size blocks (3, 2)

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 

chunks=(1, 6)

Asymmetric, but repeating size blocks (1, 6)

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 

chunks=((2, 4), (3, 3))

Asymmetric and non-repeating blocks

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 

chunks=((2, 2, 1, 1), (3, 2, 1))

Asymmetric and non-repeating blocks

 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 

Discussion

The latest examples are rarely provided by users from the source data, but arise from complex slicing and broadcasting operations. I usually use the simplest form until I need more complex forms. The choice of pieces should match the calculations you want to do.

For example, if you plan to take out thin slices along the first dimension, you might want to make this dimension narrower than others. If you plan to do linear algebra, then you may need more symmetrical blocks.

+7
source share

All Articles