How to change rows and columns in a dask data frame?

There are several issues that I am encountering with Dasc Dataframes.

says I have a data frame with 2 columns ['a','b']

if i want a new column c = a + b

in pandas I would do:

 df['c'] = df['a'] + df['b'] 

In dask, I do the following operation:

 df = df.assign(c=(df.a + df.b).compute()) 

is it possible to write this operation better, similar to what we do in pandas?

The second question is what bothers me more.

In pandas, if I want to change the value of 'a' for lines 2 and 6 to np.pi , I do the following

 df.loc[[2,6],'a'] = np.pi 

I was not able to figure out how to do a similar operation in Dask. My logic selects multiple rows, and I only want to change the values ​​in these rows.

+6
source share
1 answer

Edit Add New Columns

Setitem syntax now works in dask.dataframe

 df['z'] = df.x + df.y 

Old answer: add new columns

You are correct that the setitem syntax does not work in dask.dataframe .

 df['c'] = ... # mutation not supported 

Do you think you should use .assign(...) instead.

 df = df.assign(c=df.a + df.b) 

In your example, you have an unnecessary .compute() call. Usually you want to call the calculation only at the very end, as soon as you have the final result.

Change lines

As before, dask.dataframe does not support changing lines in place. Field operations are difficult to talk about parallel codes. Currently dask.dataframe does not have a nice alternative operation in this case. I raised question No. 653 to discuss this topic.

+6
source

All Articles