Just to add to carlosdc a good answer.
Some of the functions that install vowpal wabbit separately and allow you to scale to the size of tera-feature (10 12 ):
Online weight vector: vowpal wabbit supports weight-vector memory, which is essentially a weight vector for the model it is building. This is what you call the "state" in your question.
Size of unlimited data: The size of the weight vector is proportional to the number of attributes (independent input variables), and not to the number of examples (instances). This is what vowpal wabbit does, unlike many other (not online) students, scale in space. Since it is not necessary to load all the data into memory, as a typical student does, he can still learn from data sets that are too large to fit in memory.
Cluster mode: vowpal wabbit supports working on multiple hosts in a cluster, superimposing the structure of the binary tree nodes on the nodes and using the all-reduce abbreviation from leaves to root.
Hash trick: vowpal wabbit uses what is called hashing . All function names are hashed into murmurhash-32 using murmurhash-32 . This has several advantages: it is very simple and economical in time, without having to cope with managing and colliding hash tables, while simultaneously allowing functions to collide periodically. It turns out (in practice) that a small number of collisions of objects in the training set with thousands of different functions is similar to adding an implicit regularization term. This is counter-intuitive, often increasing the accuracy of the model, rather than reducing it. It is also agnostic to the sparseness (or density) of the spatial object. Finally, it allows input function names to be arbitrary strings, unlike most regular students, for whom object names / identifiers must be either a) numeric or b) unique.
Parallelism: vowpal wabbit uses multi-core processors, controlling parsing and training in two separate threads, adding to its speed. This is what allows vw to be able to learn as fast as reading data. It turns out that most of the supported algorithms in vw are counter-intuitively narrowly limited by I / O speed rather than learning speed.
Follow-up and incremental training: vowpal wabbit allows you to save your model to disk during training, and then load the model and continue exploring where you left off with the --save_resume option.
Test-like error assessment: Average losses calculated by tungsten fishing “as it arrives” are always on invisible (out of the sample) data (*). This eliminates the need to worry about pre-planned deductions or cross-check. The error rate that you see during training is a "test".
In addition to linear models: vowpal wabbit supports several algorithms, including matrix factorization (roughly sparse matrix SVD), Latent Dirichlet Allocation (LDA) and much more. It also supports “temporary” processing of terminal interactions (bilinear, quadratic, cubic and stern sigmoid neural networks with a given number of units), classification of several classes (in addition to basic regression and binary classification), etc.
There are official guides on github and many examples on the official vw wiki on github.
(*) An exception is the use of multiple passes with the --passes N option.