Scikit svm train one after another (online or stochastic training)

Question

Scikit svm train one after another (online or stochastic training)

I am using the scikit library to use svm. I have a huge amount of data that I cannot read together to give fit() .
I want to iterate over all my data that is in a file and train svm one by one. Is there any way to do this. It is unclear how to draw up the documentation, and in their tutorial they give complete fit data right away.
Is there a way to train it one by one (it could be something like calling fit for each training input template).

+4

python scikit-learn machine-learning svm

Abhishek gupta Apr 21 '13 at 7:15

source share

1 answer

ogrisel · Answer 1 · 2013-04-21T08:15:05+0000

Support for Vector Machine (at least as implemented in libsvm, which scikit-learn is a shell) is basically a batch algorithm: it should have access to all the data in memory at once. Therefore, they are not scalable.

Instead, you should use models that support incremental learning using the partial_fit method. For example, some linear models such as sklearn.linear_model.SGDClassifier support the partial_fit method. You can slice your data set and load it as a sequence of thumbnails with the form (batch_size, n_features) . batch_size may be 1, but not efficient, since python interpreter overhead (+ overhead for loading data). Therefore, it is recommended to conduct samples using miniatures of at least 100.

Scikit svm train one after another (online or stochastic training)

More articles: