Using LIBSVM grid.py for unbalanced data?

I have a problem with three classes with unbalanced data (90%, 5%, 5%). Now I want to train the classifier using LIBSVM.

The problem is that LIBSVM optimizes its gamma and Cost parameter for optimal accuracy, which means that 100% of the examples are classified as class 1, which, of course, is not what I want.

I tried modifying the -w weight parameters without much success.

So what I want is changing grid.py so that it optimizes Cost and gamma for accuracy and recall separated by classes, and not for general accuracy. Is there any way to do this? Or are there other scripts that can do something like this?

+7
source share
4 answers

The -w option is what you need for unbalanced data. What have you tried so far?

If your classes are:

  • class 0: 90%
  • class 1: 5%
  • class 2: 5%

You must pass the following parameters to svm:

-w0 5 -w1 90 -w2 90 
+8
source

If you want to try an alternative, one of the programs in the svmlight family, http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html , directly minimizes the area under the ROC curve.

Minimizing AUC can give better results than re-weighing case studies.

+4
source

You can optimize any accuracy, reminder, F-score and AUC with grid.py Tweak is what you need to change the cross-validation measure used by svm-train in LIBSVM. Follow the procedure on the LIBSVM website .

0
source

If you have unbalanced data, you probably should not optimize accuracy. Optimize your f-score instead (or recall if this is more important to you). You can change the rating function as described here .

I think you should also optimize the gamut and cost using various configurations of class weight. I changed the get_cmd function in grid.py, passing various class weights (-wi weight) for this purpose. In my experience, weighting classes doesn't always help.

0
source

All Articles