How can I vectorize a list using sklearn DictVectorizer

Question

How can I vectorize a list using sklearn DictVectorizer

I found the following example on the docs website:

>>> measurements = [ ... {'city': 'Dubai', 'temperature': 33.}, ... {'city': 'London', 'temperature': 12.}, ... {'city': 'San Fransisco', 'temperature': 18.}, ... ] >>> from sklearn.feature_extraction import DictVectorizer >>> vec = DictVectorizer() >>> vec.fit_transform(measurements).toarray() array([[ 1., 0., 0., 33.], [ 0., 1., 0., 12.], [ 0., 0., 1., 18.]]) >>> vec.get_feature_names() ['city=Dubai', 'city=London', 'city=San Fransisco', 'temperature']

And I need the vectorize dict to look like this:

 >>> measurements = [ ... {'city': ['Dubai','London'], 'temperature': 33.}, ... {'city': ['London','San Fransisco'], 'temperature': 12.}, ... {'city': ['San Fransisco'], 'temperature': 18.}, ... ]

to get the following result:

 array([[ 1., 1., 0., 33.], [ 0., 1., 1., 12.], [ 0., 0., 1., 18.]])

I mean, the dict value should be a list (or a tuple, etc.).

Can I do this using DictVectorizer or any other way?

+7

python scikit-learn

fi11er Jun 16 '14 at 19:07

source share

1 answer

Fred foo · Accepted Answer · 2014-06-18T10:37:24+0000

Change view to

 >>> measurements = [ ... {'city=Dubai': True, 'city=London': True, 'temperature': 33.}, ... {'city=London': True, 'city=San Fransisco': True, 'temperature': 12.}, ... {'city': 'San Fransisco', 'temperature': 18.}, ... ]

Then the result will be what you expect:

 >>> vec.fit_transform(measurements).toarray() array([[ 1., 1., 0., 33.], [ 0., 1., 1., 12.], [ 0., 0., 1., 18.]])

How can I vectorize a list using sklearn DictVectorizer

More articles: