Efficient way to create dummy variables in python

I want to create a vector of dummy variables (can only take O or 1). I do the following:

data = ['one','two','three','four','six']
variables = ['two','five','ten']

I got the following two ways:

dummy=[]
for variable in variables:
    if variable in data:
        dummy.append(1)
    else:
        dummy.append(0)

or with a list:

dummy = [1 if variable in data else 0 for variable in variables]

The results are in order:

>>> [1,0,0]

Is there a build function that does this task faster? Its appearance is slow if the variables are thousands.

Edit : Results using time.time(): I am using the following data:

data = ['one','two','three','four','six']*100
variables = ['two','five','ten']*100000
  • Loop (from my example): 2.11 sec
  • list comprehension: 1.55 s
  • list comprehension (variables are a type of set): 0.0004992 sec
  • Example from St. Petersburg: 0.0004999 seconds
  • Example from falsetrue: 0.000502 sec
+4
source share
2 answers

data set, .

, 1 0 True False.

>>> int(True)
1

__contains__ , .

:

dummy = list(map(int, map(set(data).__contains__, variables)))

, , .

set , variable. :

search = set(data)
dummy = [int(variable in search) for variable in variables]
+7
  • set - item in set O (1)/item in list O (n)
  • int (bool) 1 0. ( )

>>> data = ['one','two','three','four','six']
>>> variables = ['two','five','ten']
>>> xs = set(data)
>>> [int(x in xs) for x in variables]
[1, 0, 0]
+2

All Articles