So this is from my own code that I used to predict last year's StackOverflow:
from __future__ import division from pandas import * from sklearn import cross_validation from sklearn import metrics from sklearn.ensemble import GradientBoostingClassifier basic_feature_names = [ 'BodyLength' , 'NumTags' , 'OwnerUndeletedAnswerCountAtPostTime' , 'ReputationAtPostCreation' , 'TitleLength' , 'UserAge' ] fea =
So, if we wanted a subset of functions for classification, I could do this:
# want to train using fewer features so remove 'BodyLength' basic_feature_names.remove('BodyLength') clf.fit(fea[basic_feature_names], orig_data['OpenStatusMod'].values)
So the idea is that a list can be used to select a subset of columns in a pandas frame, so we can build a new list or delete a value and use it to select
I'm not sure how easy this can be done using numpy arrays, since indexing is done differently.
source share