How to create a lot of interaction conditions in Pandas?

Question

How to create a lot of interaction conditions in Pandas?

I would like to evaluate the regression model IV using a lot of interactions with summer, demographic, etc. mannequins. I cannot find an explicit method for this in Pandas, and I'm curious if anyone has any tips.

I'm thinking of trying scikit-learn and this function:

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

+6

python pandas scikit-learn statsmodels

pdevar Oct 21 '15 at 10:42

source share

2 answers

Marcus V. · Answer 1 · 2017-03-09T15:17:26+0000

I had a similar problem when I needed a flexible way to create specific interactions and view StackOverflow. I followed the advice in the comment above @ user333700 and thanks to him I found patsy ( http://patsy.readthedocs.io/en/latest/overview.html ), and after searching on Google this scikit- Learn patsylearn integration ( https: // github.com/amueller/patsylearn ).

So, going to @ motam79 for example, this is possible:

import numpy as np import pandas as pd from patsylearn import PatsyModel, PatsyTransformer x = np.array([[ 3, 20, 11], [ 6, 2, 7], [18, 2, 17], [11, 12, 19], [ 7, 20, 6]]) df = pd.DataFrame(x, columns=["a", "b", "c"]) x_t = PatsyTransformer("a:b + a:c + b:c", return_type="dataframe").fit_transform(df)

This returns the following:

  a:ba:cb:c 0 60.0 33.0 220.0 1 12.0 42.0 14.0 2 36.0 306.0 34.0 3 132.0 209.0 228.0 4 140.0 42.0 120.0

I answered a similar question here, where I gave another example with categorical variables: How to create an interaction design matrix from categorical variables?

motam79 · Answer 2 · 2016-11-07T15:05:33+0000

You can use the polynomialFeatures sklearn function. Here is an example:

Suppose this is your design matrix (i.e. function):

 x = array([[ 3, 20, 11], [ 6, 2, 7], [18, 2, 17], [11, 12, 19], [ 7, 20, 6]]) x_t = PolynomialFeatures(2, interaction_only=True, include_bias=False).fit_transform(x)

Here is the result:

 array([[ 3., 20., 11., 60., 33., 220.], [ 6., 2., 7., 12., 42., 14.], [ 18., 2., 17., 36., 306., 34.], [ 11., 12., 19., 132., 209., 228.], [ 7., 20., 6., 140., 42., 120.]])

The first 3 functions are the original functions, and the next three are the interactions of the original functions.

How to create a lot of interaction conditions in Pandas?

More articles: