The Vowpal Wabbit inverted_hash option produces empty output, but why?

I am trying to get a wowpal wabbit model saved with inverted hashes. I have a working model with the following:

vw --oaa 2 -b 24 -d mydata.vw --readable_model mymodel.readable

which creates the model file as follows:

 Version 7.7.0 Min label:-1.000000 Max label:1.000000 bits:24 0 pairs: 0 triples: rank:0 lda:0 0 ngram: 0 skip: options: --oaa 2 :0 66:0.016244 67:-0.016241 80:0.026017 81:-0.026020 84:0.015005 85:-0.015007 104:-0.053924 105:0.053905 112:-0.015402 113:0.015412 122:-0.025704 123:0.025704 ... 

(etc. for many thousands of functions). However, to be more useful, I need to see function names. It seemed like a pretty obvious thing, but I did

vw --oaa 2 -b 24 -d mydata.vw --invert_hash mymodel.inverted

and he created such a model file (no weights):

 Version 7.7.0 Min label:-1.000000 Max label:1.000000 bits:24 0 pairs: 0 triples: rank:0 lda:0 0 ngram: 0 skip: options: --oaa 2 :0 

It seems like I obviously did something wrong, but I think I am using options in a documented path :

--invert_hash is similar to --readable_model , but the model is displayed in a more readable format with function names followed by weights, not hash indexes and weights.

Does anyone see why my second command is not producing any output?

+8
vowpalwabbit
source share
1 answer

This is caused by a bug in VW that has been fixed recently (due to this issue), see https://github.com/JohnLangford/vowpal_wabbit/issues/337 .

By the way, using --oaa 2 does not make sense. If you need binary classification (e.g. logistic regression), use --loss_function=logistic (and make sure your labels are 1 and -1). OAA makes sense only for N> 2 classes (and it is recommended to use --loss_function=logistic with --oaa ).

Also note that learning with --invert_hash much slower (and requires more memory, of course). The recommended way to create an inverted hash model, especially with multiple passages, is to study the normal binary model and then convert it to an inverted hash using one pass over the training data with -t :

 vw -d mytrain.data -c --passes 4 -oaa 3 -f model.binary vw -d mytrain.data -t -i model.binary --invert_hash model.humanreadable 
+9
source share

All Articles