Machine learning in OCaml or Haskell?

I hope to use Haskell or OCaml in the new project, because R is too slow. I need to be able to use vectory support machines, ideally sharing each execution for parallel work. I want to use a functional language, and I feel that these two are the best in terms of performance and elegance (I like Clojure, but it was not so fast in a short test). I am inclined towards OCaml because it seems that integration with other languages ​​is more supported, so it may be better suited in the long run (for example, OCaml-R ).

Does anyone know a good tutorial for such an analysis or sample code in Haskell or OCaml?

+58
haskell machine-learning ocaml
Feb 15 '10 at 21:04
source share
10 answers

Hal Daume wrote some basic machine learning algorithms during his Ph.D. (now he is an assistant professor and a rising star in the machine learning community).

His webpage has SVM, a simple decision tree and logistic regression in OCaml. By reading this code, you can feel how machine learning models are implemented in OCaml.

I would also like to mention F #, the new .Net language, similar to OCaml. Here is a factor graph model written in F # that analyzes the data of a chess game. This study also has a NIPS publication.

Although FP is suitable for the introduction of machine learning models and data mining. But what you can get here the most is not performance. It is true that FP supports parallel computing better than imperative languages ​​such as C # or Java. But implementing a parallel SVM or decision tree has very little to do with the language! The parallel is parallel. Numerical optimizations of machine learning and data mining are usually required, so they are purely functionally usually difficult and less efficient. The execution of these complex algorithms is a very difficult task at the algorithm level, and not at the language level. If you want to run 100 SVMs in parallel, FP helps here. But I do not see the difficulty of running 100 libsvm parallel in C ++, except that a single libsvm stream is more efficient than the unverified haskell svm package.

Then what do FP languages ​​like F #, OCaml, Haskell give?

  • Easy to check your code. FP languages ​​usually have a top-level interpreter; you can test your functions on the fly.

  • Several volatile states. This means that passing the same parameter to a function, this function always gives the same result, so debugging is easy in FP.

  • The code is concise. Type of output, pattern matching, closure, etc. You focus more on domain logic and less on the language part. Therefore, when you write code, your mind mainly thinks about the programming logic itself.

  • Writing code in FP is fun.

+46
Feb 22 '10 at 1:50
source share

The only problem I see is that OCaml does not actually support multi-core parallelism, while GHC has excellent support and performance. If you want to use multiple threads of execution, with multiple calls, GHC Haskell will be much easier.

Secondly, FFI Haskell is more powerful (that is, it does more with less code) than OCaml, and more libraries are available (via Hackage: http://hackage.haskell.org ), so I don’t think that the deciding factor there will be foreign interfaces.

+21
Feb 15 '10 at 22:17
source share

As for multilingual integration, the combination of C and Haskell is surprisingly easy, and I say this as someone who (unlike dons ) is not a lot of experts. Any other language that integrates well with C doesn't have to be much more complicated; you can always go back to the thin interface in C, if nothing else. For better or worse, C is still a programming language, so Haskell is more than acceptable in most cases.

... but. You say that you are motivated by performance issues and want to use a "functional language." From this, I suggest that you are not yet familiar with the languages ​​you are asking about. Among the defining features of Haskell is that by default they use non-line estimates and immutable data structures, which are both incredibly useful in many ways, but it also means that optimizing Haskell for performance is often very different from other languages ​​and well-driven instincts can lead you astray in frightening ways. You can view performance topics on the Haskell wiki to get an idea of ​​the problems.

This is not to say that you cannot do what you want in Haskell - you certainly can. Both laziness and immutability can actually be used to increase productivity ( Chris Okasaki's thesis gives some nice examples). But keep in mind that there will be a little learning curve when it comes to performance.

Both Haskell and OCaml provide excellent benefits for using the ML family of languages, but for most programmers, OCaml is likely to offer a softer learning curve and better results.

+15
Feb 16 '10 at 15:30
source share

It is difficult to give a definitive answer to this. Haskell has the advantages that Don talked about, with a more powerful type system and cleaner syntax. OCaml will be easier to find out if you come from almost any other language (this is because Haskell functions in the same way as functional languages), and working with mutable random access structures can be a little awkward in Haskell. You will probably also find the performance characteristics of your OCaml code more intuitive than Haskell, due to the lazy evaluation of Haskell.

In fact, I would recommend you evaluate how you have time. Here are some relevant Haskell resources:

Oh, if you look further at Haskell, be sure to sign up for the Haskell Beginners and Haskell Cafe . The community is friendly and eager to help newbies (does my bias show?).

+13
Feb 16 2018-10-16T00
source share

If speed is your main concern, then go to C. Haskell - pretty good performance, but you will never get such a fast result. As far as I know, the only functional language that improved C in the benchmark is the Stalin scheme, but it is very old and no one knows how it works.

I wrote a genetic programming library where performance was key, and I wrote it in a functional style in C. The functional style allowed me to easily parallelize it using OMP, and it scales linearly to 8 cores in a single process. You certainly cannot do this in OCaml, although Haskell is constantly improving in terms of concurrency and parallelism.

The downside of using C was that it took me months to finally find all the errors and stop the kernels that were extremely complex due to concurrency. Haskell would probably catch 90% of these errors on first compilation.

So speed at all costs? Looking back, I would like for me to use Haskell, as I could withstand it 2 to 3 times slower if I saved more than a month during development.

+10
May 19 '10 at 12:24
source share

While dons is correct that thread-level multi-core parallelism is better supported in Haskell, it looks like you can live with a parallelism process level (from your phrase: ideally separate each execution for parallel work). which is well supported in OCaml. Keith noted that Haskell has a more powerful type system, but it can also be said that OCaml has a more powerful modular system than Haskell.

As others have pointed out, the OCaml learning curve will be lower than that of Haskell; you are likely to be faster at OCaml. However, learning OCaml is a great step towards learning Haskell, because many of the underlying concepts are very similar, so you can always go to Haskell later and find a lot of familiar things there. And, as you pointed out, there is an OCaml-R bridge.

+8
Feb 18 2018-10-18
source share

For examples of Haskell and Ocaml in machine learning, see Hal Daume and Lloyd Allison homepages. IMO is much easier to achieve C ++ - similar performance in Ocaml than in Haskell. Thanks, as already mentioned, Haskell has a much nicer community (packages, tools, and support), syntax and functions (like FFI, probabilistic monads using cool classes), and parallel programming support.

+6
Feb 18 '10 at 23:45
source share

After upgrading OCaml-R, I have a few comments for integrating OCaml and R. Maybe it’s worth using OCaml to call R code, it works, but it’s not very simple yet. Therefore, it is worth using it for the R pilot. Integrating the functionality of R is much more cumbersome still, as, for example, much remains to be done to export the system and data of type R to OCaml seamlessly (you will have to work on it). Moreover, the interaction of R GC and OCaml GC is a delicate point: you free n values ​​O (n ^ 2) times, which is not very nice (to solve this problem you need either a more flexible R API, as far as I understand, or implement the GC in the binding itself as a large array of R for the correct interaction between the GC).

In short, I would go for "pilot R from OCaml."

Contributions at the GC interoperability level and when mapping R data types to OCaml are welcome.

+6
May 03 '10 at 12:02
source share
+2
May 05 '10 at 12:40
source share

Late answer, but Haskell's machine learning library is available here: https://github.com/mikeizbicki/HLearn

This library implements various ML algorithms that have much faster cross-validation than regular implementations. It is based on the following article Algebraic Classifiers: A General Approach to Quick Cross Validation, Online Learning, and Parallel Learning The authors claim that 400-fold acceleration compared to the same task in Weka.

+1
02 Mar. '16 at 10:43
source share



All Articles