I found the following example on the docs website:
>>> measurements = [ ... {'city': 'Dubai', 'temperature': 33.}, ... {'city': 'London', 'temperature': 12.}, ... {'city': 'San Fransisco', 'temperature': 18.}, ... ] >>> from sklearn.feature_extraction import DictVectorizer >>> vec = DictVectorizer() >>> vec.fit_transform(measurements).toarray() array([[ 1., 0., 0., 33.], [ 0., 1., 0., 12.], [ 0., 0., 1., 18.]]) >>> vec.get_feature_names() ['city=Dubai', 'city=London', 'city=San Fransisco', 'temperature']
And I need the vectorize dict to look like this:
>>> measurements = [ ... {'city': ['Dubai','London'], 'temperature': 33.}, ... {'city': ['London','San Fransisco'], 'temperature': 12.}, ... {'city': ['San Fransisco'], 'temperature': 18.}, ... ]
to get the following result:
array([[ 1., 1., 0., 33.], [ 0., 1., 1., 12.], [ 0., 0., 1., 18.]])
I mean, the dict value should be a list (or a tuple, etc.).
Can I do this using DictVectorizer or any other way?