Vowpal Wabbit: low rank matrix factorization?

I have a very simple question. I would like to do low rank matrix factorization, and I looked at the Vowpal Wabbit documentation on this topic. My question is:

Is there a difference between the two approaches? (implementation or otherwise)

$ vw --lrq ab5 

or

 $ vw -q ab --rank 5 

Here a and b are the namespaces of the objects, and 5 is the dimension of the hidden factor.


Possible follow-up:

If they are equivalent, will --rank work for higher order interactions?

+6
source share
1 answer

short answer:

--rank and --lrq are two separate and very different implementations of matrix factorization / decomposition in wowpal wabbit.

Matrix factorization, sometimes called Matrix decomposition, is a general term in ML; there are many ways to approximate a matrix using simpler factors (sometimes with loss of information).

Despite the fact that they have some similarities in the sense that they both try to fix the strongest latent interactions between the two sets of attributes, they are not equivalent either in the implementation or in the quality of the model that they produce. Their performance is highly dependent on the problem.

More details:

  • --rank was the first implementation of MF in vw Jake Hoffman. It was inspired by SVD (singular decomposition)
  • --lrq was implemented a few years later by Paul Mineiro. It was inspired by libfm

In datasets that are difficult to generalize (e.g. movielens 1M, where the user has at most one rating per movie), --lrq seems to work better. It seems to be using better defaults, converging faster, more efficient, and generating much smaller models on disk. --rank may work better on other datasets where there are more examples for each user / element.

You can say that the two implementations give different results by running an example. for example, select a dataset in the test directory and run two algos on it:

 vw --lrq aa3 test/train-sets/0080.dat 

vs

 vw --rank 3 -q aa test/train-sets/0080.dat 

Feel free to add: --holdout_off -c --passes 1000 so that they work longer, so you can compare the time intervals between them.

You noticed that the two use a different number of functions for example ( --lrq is more minimalistic and will only look at the subset that you are explicitly talking about) that convergence and finite mean loss are better with --lrq . If you save the model with -f modelname , you will notice that it will be much smaller with --lrq , especially on large datasets.

OTOH, if you try a dataset like test/train-sets/ml100k_small_train in the source tree with a rank of 10 between the u (user) and i (item) namespaces, you will get better loss with --rank than with --lrq . This shows that the best of them depends on the data set.

higher interactions (e.g. --cubic )

To your second question: --rank will not allow higher interactions. If you try to add --cubic , you get an error message:

 vw (gd_mf.cc:139): cannot use triples in matrix factorization 

but it will allow several / extra -q (quadratic) interactions.

--lrq less fussy, so you can add higher order interaction options to it.

Other differences:

As a rule, --lrq more agnostic and does not depend on other vw options, while --rank uses its own stand-alone SGD code and does not accept other parameters (for example, --normalized , or --adaptive ). In addition, the memory requirements for --rank higher.

Again, the results will depend on the data, additional options, and specific interactions.

further reading

- rank

- LRQ

+5
source

All Articles