First of all , some background for kernels and SVM ...
If you want to pre-compute the kernel for vectors n (any dimension), then you need to compute the kernel function between each pair of examples. The kernel function takes two vectors and gives a scalar, so you can imagine the previously calculated kernel as the nxn matrix for scalars. It is usually called the kernel matrix, or sometimes the Gram matrix.
There are many different kernels; the simplest is a linear kernel (also called a point product):
sum(x_i * y_i) for i in [1..N] where (x_1,...,x_N) (y_1,..,y_N) are vectors
Secondly, trying to answer your problem ...
The documentation on precomputed kernels in libsvm is actually pretty good ...
Assume the original training data has three four-feature instances
and testing data has one instance:
15 1: 1 2: 1 3: 1 4: 1
45 2: 3 4: 3
25 3: 1
15 1: 1 3: 1
If the linear kernel is used, we have the following
new training / testing sets:
15 0: 1 1: 4 2: 6 3: 1
45 0: 2 1: 6 2:18 3: 0
25 0: 3 1: 1 2: 0 3: 1
15 0 :? 1: 2 2: 0 3: 1
Each vector here in the second example is a row in the kernel matrix. The value in the zero index value is the value of the identifier, and it just seems consistent. The value in index 1 of the first vector is the value of the kernel function of the first vector from the first example with itself (i.e. (1x1)+(1x1)+(1x1)+(1x1) = 4 ), the second is the value of the kernel function of the first vector with the second (i.e. (1x3)+(1x3)=6 ). It follows that for the rest of the example. You can see that the kernel matrix is symmetric, as it should be, because K (x, y) = K (y, x).
It is worth noting that the first set of vectors is presented in a sparse format (i.e., missing values are zero), but the kernel matrix is not and should not be sparse. I don’t know why this is so, it just looks like libsvm.
Stompchicken
source share