How to convert this confusing Python string to R

I am very new to Python, and I wonder what the following line of code does and how it can be written in R:

df['sticky'] = df[['humidity', 'workingday']].apply(lambda x: (0, 1)[x['workingday'] == 1 and x['humidity'] >= 60], axis = 1) 

For example, what is the meaning of lambda x: (0, 1) ?

PS df is pandas dataframe

+7
python pandas r
source share
4 answers

Start with lambda . Full expression:

  lambda x: (0, 1)[x['workingday'] == 1 and x['humidity'] >= 60] 

and this is an anonymous function that takes one argument x and returns:

  • 1 if x['workingday'] == 1 and x['humidity'] >= 60
  • 0 otherwise

the trick (0, 1)[...] used to return 0 or 1 instead of the Python False and True buffers. It uses the fact that False and True will be forced to numeric 0 and 1 when used instead of a numeric value, for example. as an index of an array (or tuple). For example, if an expression evaluates to True , cell 1 tuple that contains 1 refers to it.

This function is displayed on each row (Pandas?) Of the DataFrame (in fact, only in the filtered columns 'humidity' and 'workingday' ), and the result is stored in the column 'sticky' . However, you can translate the same expression into R using anonymous function and apply :

 df$sticky <- apply(df[, c("workingday", "humidity")], 1, function(x) { x["workingday"] == 1 & x["humidity"] >= 60; }); 

(filtering is probably not needed, but my R skills are pretty rusty).

However, there is a more idiomatic way to achieve the same as kdopen wrote:

 df$sticky <- df$workingday == 1 & df$humidity >= 60 
+5
source share

The idiomatic equivalent of R would be

 df$sticky <- df$workingday == 1 & df$humidity >= 60 

Assuming desire is getting an indicator column.

Stefano explained the Python code well. A fully extended version of lambda may be

 def func(x): if x['workingday'] == 1 and x['humidity'] >= 60: return 1 else: return 0 

But you will never write that

+2
source share

I have to say that this is a weird way of applying a function to pandas df, anyway, this is an example that shows what it does:

 In [280]: # create the df df = pd.DataFrame({'a':np.arange(10), 'b':[1,1,1,2,2,3,3,4,5,5]}) df Out[280]: ab 0 0 1 1 1 1 2 2 1 3 3 2 4 4 2 5 5 3 6 6 3 7 7 4 8 8 5 9 9 5 

The lambda expression calls apply and passes axis=1 , which means a row and checks each named column to see if the expression is True or False, (0,1) discards this to int , otherwise you will return a returned logical dtype.

 In [285]: df.apply(lambda x: x['a'] > 5 and x['b'] < 5, axis=1) Out[285]: 0 False 1 False 2 False 3 False 4 False 5 False 6 True 7 True 8 False 9 False dtype: bool 

Using (0,1) cast:

 In [282]: # apply a lambda, test if 'a' is greater and 5 and 'b' is less than 5, row-wise, cast the result to 1, 0 if True or False df.apply(lambda x: (0,1)[x['a'] > 5 and x['b'] < 5], axis=1) Out[282]: 0 0 1 0 2 0 3 0 4 0 5 0 6 1 7 1 8 0 9 0 dtype: int64 

The pandas way would be to do it like this:

 In [284]: ((df['a'] > 5) & (df['b'] < 5)).astype(int) Out[284]: 0 0 1 0 2 0 3 0 4 0 5 0 6 1 7 1 8 0 9 0 dtype: int32 

I do not know R, so I can not comment on it.

0
source share

A dplyr complete / reproducible solution:

 library(dplyr) set.seed(1492) df <- data_frame(working_day=sample(0:1, 100, replace=TRUE), humidity=sample(20:90, 100, replace=TRUE)) df %>% mutate(sticky=working_day==1 & humidity >=60) -> df 

If you really need 0 or 1 :

 df %>% mutate(sticky=as.numeric(working_day==1 & humidity >=60)) -> df 
0
source share

All Articles