If you use OneHotEncoder in your design matrix to get a single-string design matrix, then interactions are nothing more than multiplication between columns. If X_1hot is your hot design matrix, where the patterns are lines, then for 2nd order interaction you can write
X_2nd_order = (X_1hot[:, np.newaxis, :] * X_1hot[:, :, np.newaxis]).reshape(len(X_1hot), -1)
There will be duplicate interactions, and they will also contain original features.
Going to random order will make your design matrix explode. If you really want this, then you should study a kernel with a polynomial kernel, which will allow you to easily switch to an arbitrary degree.
Using the data frame you presented, we can act as follows. Firstly, a manual way to build one hot design from a data frame:
import numpy as np indicators = [] state_names = [] for column_name in df.columns: column = df[column_name].values one_hot = (column[:, np.newaxis] == np.unique(column)).astype(float) indicators.append(one_hot) state_names = state_names + ["%s__%s" % (column_name, state) for state in np.unique(column)] X_1hot = np.hstack(indicators)
Column names are then stored in state_names , and the matrix of indicators is X_1hot . Then we calculate the second-order functions
X_2nd_order = (X_1hot[:, np.newaxis, :] * X_1hot[:, :, np.newaxis]).reshape(len(X_1hot), -1)
To find out the column names of the second-order matrix, we build them like this:
from itertools import product one_hot_interaction_names = ["%s___%s" % (column1, column2) for column1, column2 in product(state_names, state_names)]