How can I vectorize a list using sklearn DictVectorizer

I found the following example on the docs website:

>>> measurements = [ ... {'city': 'Dubai', 'temperature': 33.}, ... {'city': 'London', 'temperature': 12.}, ... {'city': 'San Fransisco', 'temperature': 18.}, ... ] >>> from sklearn.feature_extraction import DictVectorizer >>> vec = DictVectorizer() >>> vec.fit_transform(measurements).toarray() array([[ 1., 0., 0., 33.], [ 0., 1., 0., 12.], [ 0., 0., 1., 18.]]) >>> vec.get_feature_names() ['city=Dubai', 'city=London', 'city=San Fransisco', 'temperature'] 

And I need the vectorize dict to look like this:

 >>> measurements = [ ... {'city': ['Dubai','London'], 'temperature': 33.}, ... {'city': ['London','San Fransisco'], 'temperature': 12.}, ... {'city': ['San Fransisco'], 'temperature': 18.}, ... ] 

to get the following result:

 array([[ 1., 1., 0., 33.], [ 0., 1., 1., 12.], [ 0., 0., 1., 18.]]) 

I mean, the dict value should be a list (or a tuple, etc.).

Can I do this using DictVectorizer or any other way?

+7
python scikit-learn
source share
1 answer

Change view to

 >>> measurements = [ ... {'city=Dubai': True, 'city=London': True, 'temperature': 33.}, ... {'city=London': True, 'city=San Fransisco': True, 'temperature': 12.}, ... {'city': 'San Fransisco', 'temperature': 18.}, ... ] 

Then the result will be what you expect:

 >>> vec.fit_transform(measurements).toarray() array([[ 1., 1., 0., 33.], [ 0., 1., 1., 12.], [ 0., 0., 1., 18.]]) 
+17
source share

All Articles