How to avoid empty result with `Bag.take (n)` when using dask?

Context: The Dask documentation clearly indicates that Bag.take() will only be collected from the first section. However, when using a filter, it may happen that the first section is empty and the others are not.

Question: Is it possible to use Bag.take() so that it collects from a sufficient number of sections to collect elements n (or the maximum available less than n ).

+2
python dask bag
source share
1 answer

You can do something like the following:

 from toolz import take f = lambda seq: list(take(n, seq)) b.reduction(f, f) 

This captures the first n elements of each section, collects them all together, and then takes the first n elements of the result.

+1
source share

All Articles