Custom kernels for SVM, when to use them?

I am new to machine learning and am trying to understand how the most common learning algorithms work and when to use each of them. At the moment, I am studying how Vector Vector Machines work, and ask a question about custom kernel functions.
There is a lot of information on the Internet about more standard (linear, RBF, polynomial) kernels for SVM. However, I would like to understand when it makes sense to go for a custom kernel function. My questions:

1) What are the other possible kernels for SVM?
2) In what situation can custom kernels be applied?
3) Can the user kernel significantly improve the quality of SVM prediction?

+6
source share
1 answer

1) What are the other possible kernels for SVM?

There are infinitely many of them, see, for example, a list of those implemented in pykernels (which is far from exhaustive)

https://github.com/gmum/pykernels

  • Linear
  • polynomial
  • RBF
  • Cosine of similarity
  • Exponential
  • Laplacians
  • Rational quadratic
  • Reverse multi-squared
  • Cauchy
  • T student
  • Anova
  • Additive Chi ^ 2
  • Chi ^ 2
  • Minmax
  • Crossroads Min / Histogram
  • Generalized histogram intersection
  • Spline
  • Sorensen
  • Tanimoto
  • Wavelet
  • Fourier
  • Journal (CPD)
  • Nutrition (CPD)

2) In what situation could custom kernels be applied?

Basically in two cases:

  • "simple" give very bad results.
  • data are specific in a sense, and therefore, for the use of traditional cores, it is necessary to degenerate them. For example, if your data is in a graphical format, you cannot use the RBF kernel, since the graph is not a constant size vector, therefore, to work with this object, you need a graph core without any information that loses its projection. also sometimes you have an idea about the data, you know about some basic structure that can help the classifier. One such example is periodicity, you know that there is some kind of response effect in your data - then, perhaps, it is worth looking for a specific core, etc.

3) Can the user kernel significantly improve the quality of SVM prediction?

Yes, in particular, there always exists a (hypothetical) Bayesian optimal kernel, defined as:

K(x, y) = 1 iff arg max_l P(l|x) == arg max_l P(l|y) 

in other words, if a person has a true probability P (l | x) of label l assigned to point x, then we can create a kernel that pretty much matches your data points with one hot coding, their most probable labels, which leads to an optimal classification Bayes (as this will lead to Bayes risk).

In practice, of course, it is impossible to get such a core, since this means that you have already solved your problem. However, this shows that there is the concept of an "optimal kernel", and obviously, none of the classical ones are of this type (if your data is not derived from simple veeeery distributions). In addition, each core is a kind of earlier over the decisive functions - you are closer to the actual one with your induced family of functions - the more likely it is to get a reasonable classifier using SVM.

+8
source

All Articles