LibSVM turns all my vectors into support vectors, why?

I am trying to use SVM to classify news articles.

I created a table containing functions (unique words found in documents) as strings. I created a mapping of vector weights with these functions. that is, if the article contains a word that is part of the table of feature vectors, the location is marked as 1 or 0 more.

Example: - Generated training pattern ...

1 1: 1 2: 1 3: 1 4: 1 5: 1 6: 1 7: 1 8: 1 9: 1 10: 1 11: 1 12: 1 13: 1 14: 1 15: 1 16: 1 17 : 1 18: 1 19: 1 20: 1 21: 1 22: 1 23: 1 24: 1 25: 1 26: 1 27: 1 28: 1 29: 1 30: 1

Since this is the first document, all functions are present.

I use 1 , 0 as class labels.

I use svm.Net for classification.

I gave weight vectors 300 , manually classified as training data, and the created model accepts all vectors as reference vectors, which, of course, processes.

My common functions ( unique words/row count in the DB function vector table) 7610 .

What could be the reason?

Because of this, during installation, my project is now in rather poor condition. He classifies each article as a positive article.

In LibSVM binary classification, is there any restriction on the class label?

I use 0 , 1 instead of -1 and +1 . This is problem?

+4
c # machine-learning svm libsvm
Apr 20 2018-11-11T00:
source share
3 answers

As pointed out, searching for parameters is probably a good idea before doing anything else.

I would also explore the various kernels available to you. The fact that you are entering binary data can be problematic for the RBF kernel (or it may make its use suboptimal compared to another kernel). I do not know which core might be better suited. Try the linear core and look at additional suggestions / ideas :)

For more information and possibly better answers, look at stats.stackexchange.com.

+1
Apr 22 '11 at 3:50 a.m.
source share

You need to search for parameters of any type, also if the classes are not balanced, the classifier can get artificially high accuracy without doing much. This tutorial is good for learning basic, practical things; you should probably read it.

+3
Apr 20 '11 at 18:18
source share

I would definitely try using -1 and +1 for your shortcuts, which is the standard way to do this.

Also, how much data do you have? Since you work in a 7610-dimensional space, you can potentially have so many support vectors where another vector β€œsupports” the hyperplane in each dimension.

Using this many functions, you can try some method of selecting functions, for example, analysis of the main components.

+1
Apr 22 '11 at 3:23
source share



All Articles