Current state of classification algorithms

We know that there are thousands of classifiers, I was recently told that some people say that adaboost like going out of the shell.

  • Are there any better algorithms (with this idea of ​​voting)
  • What is the current state of the classifiers. Do you have an example?
+6
algorithm artificial-intelligence classification adaboost
source share
5 answers

Hastie et al. (2013, “Elements of Statistical Learning”) conclude that the Gradient Boosting Machine is the best “off-the-shelf” method. Whatever your problem. Definition (see page 352): The “ready-made” method is a method that can be directly applied to the data, without requiring a lot of time for data preprocessing or for carefully setting up the training procedure.

And a slightly outdated meaning: In fact, Breiman (NIPS Workshop, 1996) referred to AdaBoost with trees as “the best prefabricated classifier in the world” (see also Breiman (1998)).

+1
source share

Firstly, adaboost is a meta-algorithm that is used together with (on top of your) your favorite classifier. Secondly, classifiers that work well in one problem area often work poorly in another. Go to the No Free Lunch page on wikipedia. Thus, your question will not be answered AN. However, it may be interesting to find out what people use in practice.

+5
source share

Weka and Mahout are not algorithms ... they are computer libraries. They include the implementation of a wide range of algorithms. So, your best choice is to choose a library and try several different algorithms to see which one is best for your specific problem (where the "best works" will be a function of the cost of training, cost of classification and accuracy of classification).

If it were me, I would start with naive bayes, closest neighbors and supporting vector machines. They are well-established, well-understood methods with very different compromises. Naive Bayes is cheap, but not particularly accurate. K-NN is cheap during training, but (maybe) expensive during grading, and although it is usually very accurate, it can be prone to overtraining. SVMs are expensive to learn and have many meta-options to configure, but they are cheap to use and are usually at least as accurate as k-NN.

If you tell us more about the problem you are trying to solve, we can give more focused recommendations. But if you are just looking for one true algorithm, then there is none - the No Free Lunch theorem guarantees this.

+3
source share

Apache Mahout (open source, java) seems to take a lot of steam.

+2
source share

Weka is a very popular and stable machine learning library. It was a long time ago and was written in Java .

+2
source share

All Articles