Using LIBSVM grid.py for unbalanced data?

Question

Using LIBSVM grid.py for unbalanced data?

I have a problem with three classes with unbalanced data (90%, 5%, 5%). Now I want to train the classifier using LIBSVM.

The problem is that LIBSVM optimizes its gamma and Cost parameter for optimal accuracy, which means that 100% of the examples are classified as class 1, which, of course, is not what I want.

I tried modifying the -w weight parameters without much success.

So what I want is changing grid.py so that it optimizes Cost and gamma for accuracy and recall separated by classes, and not for general accuracy. Is there any way to do this? Or are there other scripts that can do something like this?

+7

machine-learning svm libsvm text-mining

Damnum Jul 10 '12 at 9:10

source share

4 answers

If you want to try an alternative, one of the programs in the svmlight family, http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html , directly minimizes the area under the ROC curve.

Minimizing AUC can give better results than re-weighing case studies.

+4

user1149913 Jul 14 '12 at 13:09

source share

You can optimize any accuracy, reminder, F-score and AUC with grid.py Tweak is what you need to change the cross-validation measure used by svm-train in LIBSVM. Follow the procedure on the LIBSVM website .

0

dnivog Mar 6 '17 at 13:54

source share

If you have unbalanced data, you probably should not optimize accuracy. Optimize your f-score instead (or recall if this is more important to you). You can change the rating function as described here .

I think you should also optimize the gamut and cost using various configurations of class weight. I changed the get_cmd function in grid.py, passing various class weights (-wi weight) for this purpose. In my experience, weighting classes doesn't always help.

0

Emilia apostolova Mar 22 '17 at 15:51

source share

Orientol nieto · Accepted Answer · 2012-07-10T15:08:21+0000

The -w option is what you need for unbalanced data. What have you tried so far?

If your classes are:

class 0: 90%
class 1: 5%
class 2: 5%

You must pass the following parameters to svm:

-w0 5 -w1 90 -w2 90

Using LIBSVM grid.py for unbalanced data?

More articles: