1) can I just combine new functions with old functions based on frequency and train SVM in this heterogeneous space of objects?
Since you noted this with scikit-learn : yes, you can, and you can use FeatureUnion to do this for you.
2) if not, it is a multiple core. Learn the learning method by training the core in each space of the auxiliary function and combining them using linear interpolation? (we still don't have MKL implemented in scikit-learn, right?)
Linear SVMs are the standard model for this task. Kernel methods are too slow to classify text in the real world (with the possible exception of training algorithms such as LaSVM , but this is not implemented in scikit-learn).
3) or should I turn to alternative students who are well versed in heterogeneous functions like MaxEnt and decision trees?
SVMs handle heterogeneous functions as well as MaxEnt / logistic regression. In both cases, you really have to enter scaled data, for example. with MinMaxScaler . Note that scikit-learn TfidfTransformer produces normalized vectors by default, so you don't need to scale its output, just other functions.
source share