Precomputable kernels with LibSVM in Python

Question

Precomputable kernels with LibSVM in Python

I searched the network for ~ 3 hours, but have not yet found a solution. I want to provide the precalculated libsvm core and classify the dataset, but:

How can I create a precalculated kernel? (for example, what is the main pre-computed kernel for Iris data ?)

The libsvm documentation states that:

For precomputed kernels, the first element of each instance must be an ID. For example,

samples = [[1, 0, 0, 0, 0], [2, 0, 1, 0, 1], [3, 0, 0, 1, 1], [4, 0, 1, 1, 2]] problem = svm_problem(labels, samples) param = svm_parameter(kernel_type=PRECOMPUTED)

What is an identifier? There are no more details about this. Can I assign an identifier sequentially?

Any help from libsvm and an example of pre-computed kernels are really appreciated.

+7

python machine-learning libsvm

Lyyli Mar 19 '10 at 1:32

source share

4 answers

Stompchicken · Answer 1 · 2010-03-19T10:07:27+0000

First of all , some background for kernels and SVM ...

If you want to pre-compute the kernel for vectors n (any dimension), then you need to compute the kernel function between each pair of examples. The kernel function takes two vectors and gives a scalar, so you can imagine the previously calculated kernel as the nxn matrix for scalars. It is usually called the kernel matrix, or sometimes the Gram matrix.

There are many different kernels; the simplest is a linear kernel (also called a point product):

sum(x_i * y_i) for i in [1..N] where (x_1,...,x_N) (y_1,..,y_N) are vectors

Secondly, trying to answer your problem ...

The documentation on precomputed kernels in libsvm is actually pretty good ...

  Assume the original training data has three four-feature instances 
 and testing data has one instance:

 15 1: 1 2: 1 3: 1 4: 1
 45 2: 3 4: 3
 25 3: 1
 15 1: 1 3: 1

 If the linear kernel is used, we have the following 
 new training / testing sets:

 15 0: 1 1: 4 2: 6 3: 1
 45 0: 2 1: 6 2:18 3: 0 
 25 0: 3 1: 1 2: 0 3: 1

 15 0 :?  1: 2 2: 0 3: 1

Each vector here in the second example is a row in the kernel matrix. The value in the zero index value is the value of the identifier, and it just seems consistent. The value in index 1 of the first vector is the value of the kernel function of the first vector from the first example with itself (i.e. (1x1)+(1x1)+(1x1)+(1x1) = 4 ), the second is the value of the kernel function of the first vector with the second (i.e. (1x3)+(1x3)=6 ). It follows that for the rest of the example. You can see that the kernel matrix is symmetric, as it should be, because K (x, y) = K (y, x).

It is worth noting that the first set of vectors is presented in a sparse format (i.e., missing values are zero), but the kernel matrix is not and should not be sparse. I don’t know why this is so, it just looks like libsvm.

Fabian pedregosa · Answer 2 · 2010-12-16T08:39:28+0000

scikit-learn hides most of the details of libsvm when working with custom kernels. You can simply pass an arbitrary function as your kernel, and it will compute the grammar matrix for you, or pass in a pre-computed kernel Gram matrix.

For the first syntax:

  >>> from scikits.learn import svm >>> clf = svm.SVC(kernel=my_kernel)

where my_kernel is your kernel function, and then you can call clf.fit (X, y) and it will calculate the kernel matrix for you. In the second case, the syntax is:

  >>> from scikits.learn import svm >>> clf = svm.SVC(kernel="precomputed")

And when you call clf.fit (X, y), X should be the matrix k (X, X), where k is your kernel. See also this example for more details:

http://scikit-learn.org/stable/auto_examples/svm/plot_custom_kernel.html

Gael varoquaux · Answer 3 · 2010-09-11T14:42:23+0000

I believe that the libsVM scikit-learn binding should meet your needs. See Examples and documentation at http://scikit-learn.sourceforge.net/modules/svm.html#kernel-functions

John robertson · Answer 4 · 2010-09-13T20:35:40+0000

Here is a simple vector input file for user boxes from three categories 3, which works correctly. I will explain the details (although you should also see StompChicken's answer):

1 0:1 1:10 2:12 3:21 2 0:2 1:12 2:19 3:30 1 0:3 1:21 2:30 3:130

The first number on each line is the category to which it belongs. The next entry in each line has the form 0: n and it must be sequential, i.e. 0: 1 in the first record
0: 2 in the second record
0: 3 on thrid entry

A possible reason for this is that libsvm returns the alpha_i values that go with your vectors to the output file, but for pre-computed kernels the vectors are not displayed (which can be really huge), and the index is 0: n, which was a vector with this, so that make your result easier to match with your input. Moreover, the output is not in the order in which you place them, it is grouped by category . Thus, when reading the input file, it is very useful for you to be able to map libsvm outputs to your own inputs in order to have these 0: n values. Here you can see the output.

svm_type c_svc
kernel_type precomputed
nr_class 2
total_sv 3
rho -1.53951
label 1 2
nr_sv 2 1
SV
0.4126650675419768 0: 1
0.03174528241667363 0: 3
-0.4444103499586504 0: 2

It is important to note that with pre-computed kernels you cannot omit null entries as you can with all other kernels. They must be explicitly included.

Precomputable kernels with LibSVM in Python

More articles: