Efficiency F # in Scientific Computing

I am wondering how the performance of F # compares with the performance of C ++? I asked a similar question regarding Java, and the impression I got was that Java is not suitable for an intense number of lines.

I read that F # should be more scalable and more productive, but how does this real performance compare with C ++? specific questions about the current implementation:

  • How well does it work with floating point?
  • Does it provide vector instructions
  • How friendly is it to optimizing compilers?
  • How big is print printing in memory? Does it provide fine-grained control over memory locations?
  • Does it have the ability to allocate memory processors like Cray?
  • what functions does it have that might be of interest for computational science, in which heavy number processing is involved?
  • Are there real scientific calculations that use it?

thank

+67
c ++ performance parallel-processing f # scientific-computing
May 2 '10 at 2:08
source share
10 answers
  • F # computes floating point as fast as the .NET CLR does. Not much different from C # or other .NET languages.
  • F # does not allow vector instructions on its own, but if your CLR has an API for them, F # should not have problems using it. See for example Mono .
  • As far as I know, at the moment there is only one F # compiler, so maybe the question should be "how good is the F # compiler when it comes to optimization?". The answer in any case is "potentially as good as the C # compiler, perhaps a little worse at the moment." It should be noted that F # differs from, for example, C # in its support for embedding at compile time, which potentially allows the use of more efficient code based on generics.
  • F # fingerprints are similar to images in other .NET languages. The amount of control over the distribution and garbage collection is the same as in other .NET languages.
  • I do not know about distributed memory support.
  • F # has very nice primitives for working with flat data structures, for example. arrays and lists. Look for an instance in the contents of the Array module: map, map2, mapi, iter, fold, zip ... Arrays are popular in scientific computing, I think, because of their good nature of memory locality.
  • For scientific computing packages using F #, you can see what John Harrop is doing.
+40
May 05 '10 at 15:29
source share

I am wondering how the performance of F # compares with the performance of C ++?

Varies by application. If you use complex data structures a lot in a multi-threaded program, then F # is likely to be a big win. If most of your time is spent in hard numerical loops mutating arrays, then C ++ can be 2-3 times faster.

Case study: ray tracer In my test here I use a tree for hierarchical selection and a numerical ray intersection code to create an output image. This figure is several years old, and C ++ code has been improved dozens of times over the years and is read by hundreds of thousands of people. Don Sim managed to write an F # implementation at Microsoft, which is slightly faster than the fastest C ++ code when compiling with MSVC and is parallelized using OpenMP.

I read that F # should be more scalable and more productive, but how does this real performance compare with C ++?

Code development is much simpler and faster with F # than C ++, and that concerns optimization as well as maintenance. Therefore, when you start optimizing the program, the same amount of effort will bring a much greater performance boost if you use F # instead of C ++. However, F # is a higher level language and therefore puts the lower ceiling in performance. Therefore, if you have infinite time for optimization, you should theoretically always be able to create faster C ++ code.

This is the same advantage that C ++ has over Fortran, and Fortran, of course, was involved in a handwritten compiler.

Example: QR Decomposition This is a basic numerical method from linear algebra provided by libraries such as LAPACK. Reference LAPACK implementation - 2077 Fortran lines. I wrote an F # implementation in 80 lines of code that achieve the same level of performance. But the reference implementation is not fast: vendor-based implementations such as the Intel Math Kernel Library (MKL) are often 10 times faster. It is noteworthy that I was able to optimize the F # code much higher than Intel's performance on Intel hardware, keeping my code up to 150 lines of code and completely universal (it can handle single and double precision, as well as complex and even symbolic matrices!): For high thin matrices my F # code is up to 3 times faster than Intel MKL.

Note that the moral of this study is not that you should expect your F # to be faster than libraries configured to work with manufacturers, but rather, even experts like Intel will skip productive high-level optimizations. if they will use only lower levels of languages. I suspect that Intel’s numerical optimization specialists have not been able to make full use of parallelism because their tools make it extremely cumbersome while F # makes it easy.

How well does it work with floating point?

Performance is similar to ANSI C, but some features (such as rounding modes) are not available in .NET.

Does it allow vector instructions

No.

how friendly is it to optimize compilers?

This question does not make sense: F # is Microsoft's own .NET .NET language with one compiler.

How big is print printing in memory?

The empty application uses 1.3Mb here.

Does it provide fine-grained control over memory locality?

Better than most memory-safe languages, but not as good as C. For example, you can decompress arbitrary data structures in F #, representing them as "structures".

Does it have capacity for distributed memory processors like Cray?

Depends on what you mean by "ability." If you can run .NET on this Cray, then you can use messaging in F # (as in the following language), but F # is primarily for x86 desktop multicore machines.

what functions does it have that might be of interest for computational science, in which heavy number processing is involved?

Memory security means that you do not get segmentation errors and access violations. Support for parallelism in .NET 4 is good. The ability to execute code on the fly through an interactive F # session in Visual Studio 2010 is extremely useful for interactive technical computing.

Are there real scientific computing implementations that use it?

Our commercial scientific computing products in F # already have hundreds of users.

However, your survey line indicates that you are thinking of scientific computing as high-performance computing (e.g. Cray), and not interactive technical computing (e.g. MATLAB, Mathematica). F # is for the latter.

+61
May 10 '10 at
source share

In addition to what others have said, there is one important point in F # and parallelism . The performance of regular F # code is determined by the CLR, although you can use LAPACK from F #, or you can make your own calls using the C ++ / CLI as part of your project.

However, well-designed functional programs are generally much easier to parallelize, which means that you can easily improve performance using multi-core processors, which are certainly available to you if you are engaged in scientific computing. Here are some relevant links:

As for distributed computing, you can use any distributed computing infrastructure available for the .NET platform. There is an MPI.NET project that works well with F #, but you can also use DryadLINQ, which is an MSR project.

+41
May 02 '10 at 9:48 a.m.
source share

As with all language / performance comparisons, your mileage greatly depends on how well you can code.

F # is derived from OCaml. I was surprised to learn that OCaml is used a lot in the financial world, where the performance of in-room crunching is very important. I was also surprised to learn that OCaml is one of the fastest languages, with performance along with the fastest C and C ++ compilers.

F # is built on the CLR . In the CLR, code is expressed as a bytecode called the Common Intermediate Language. This way, it benefits from JIT optimization capabilities and has performance comparable to C # (but not necessarily C ++) if the code is well written.

CIL code can be compiled into native code at a separate stage before launching using the native image generator (NGEN). This speeds up all subsequent software runs since CIL-to-native compilation is no longer needed.

Keep in mind that functional languages ​​like F # have a more declarative programming style. In a sense, you are redefining the solution in imperative languages ​​such as C ++, and this limits the compiler's ability to optimize. A more declarative programming style could theoretically give the compiler additional capabilities for algorithmic optimization.

+16
May 02 '10 at 2:17 a.m.
source share

It depends on what kind of scientific calculations you do.

If you do traditional heavy computing , for example. linear algebra, various optimizations, then you should not put your code in the .Net framework, at least not suitable in F #. Since this is at the algorithm level, most algorithms must be encoded in imperative languages ​​in order to have good performance and memory usage. Others mentioned the parallel, I have to say that it is probably useless when you do low level things like parallel implementation of SVD. Because when you know how to parallel SVD, you simply will not use high-level languages, Fortran, C or modified C (for example, cilk ) are your friends.

However, many scientific calculations today are not so, some high-level applications, for example. statistical computing and data mining. In these tasks, in addition to some linear algebra or optimization, there are also many data streams, IO, prepossessing, graphics, etc. For these tasks, F # is really effective, due to its conciseness, functionality, security and simplicity, parallel, etc.

As others noted, .NET supports Platform Invoke well, in fact quite a few projects inside MS use .Net and P / Invoke together to improve performance on the neck of the bottle.

+9
May 03 '10 at 1:50 a.m.
source share

I do not think that you will find a lot of reliable information, unfortunately. F # is still a very new language, so even if it is ideal for working with large workloads, there will still not be many people who have significant experience to report. In addition, performance is very difficult to measure accurately, and micro lenses are difficult to generalize. Even in C ++ you can see dramatic differences between compilers - you wonder if F # is compatible with any C ++ compiler or with the hypothetical "best possible" C ++ executable?

Regarding specific tests compared to C ++, here are some possible relevant links: O'Caml vs. F #: QR decomposition ; F # vs Unmanaged C ++ for parallel computing . Please note that as the author of material related to F # and as a provider of F # tools, the author has an interest in the success of F #, so take these claims with salt.

I think it is safe to say that there will be some applications where F # is competitive at runtime and probably some others where it is not. In most cases, F # will probably require more memory. Of course, the final performance will also depend heavily on the skill of the programmer - I think that F # will almost certainly be a more productive programming language for a moderately competent programmer. In addition, I think that at the moment the Windows CLR works better than Mono, in most OS for most tasks, which may also affect your decisions. Of course, since F # is probably easier to parallelize than C ++, it will also depend on the type of equipment you plan to run.

Ultimately, I think the only way to answer this question is to write F # and C ++ code representing the type of computation you want to perform and compare.

+7
May 2, '10 at 17:43
source share

Here are two examples I can share:

I have a large-scale logistic regression solver using LBFGS optimization, which is encoded in C ++. The implementation is well tuned. I changed the code to code in C ++ / CLI, i.e. Compiled the code in .Net. The .Net version is 3 to 5 times slower than the naive compiled in different data sets. If you are LBFGS code in F #, performance cannot be better than C ++ / CLI or C # (but it will be very close).

I have one more entry in Why F # is a data mining language , although it is not entirely related to the performance issue that you are addressing here, it is very related to scientific computing in F #.

+4
May 05 '10 at 1:29 a.m.
source share

If I say "ask again in 2-3 years", I think that will fully answer your question :-)

First, don’t expect F # to be different from C # unless you do some confusing recursions on purpose, and I would suggest that you haven’t since you asked about the numbers.

The floating point should be better than Java, since the CLR is not aimed at single-platform between platforms, which means that the JIT will run up to 80 bits whenever possible. On the other hand, you do not control this, besides monitoring the number of variables, to make sure that there are enough FP registers.

If you scream loudly enough, something may happen in 2-3 years, since Direct3D enters .NET as a regular API, since C # code executed in XNA runs on Xbox, which is so close to bare metal You can get with the CLR. This all the same means that you will need to do some intermediate code yourself.

So don't expect CUDA or even the ability to just bundle NVIDIA libraries and go. You would have much more luck trying this approach with Haskell if, for some reason, you really need a “functional” language, since Haskell was designed to be easy to communicate from a pure need.

Mono.Simd has already been mentioned, and although it needs to be backward portable to the CLR, there can be quite some work to do it.

In the social.msdn code, you can use some code when using SSE3 in .NET, with C ++ / CLI and C #, come blitting array, entering the SSE3 code for perf, etc.

There was some talk about running CECIL on compiled C # to extract the parts in HLSL, compile it into shaders and associate the glue code with the graph (CUDA does the equivalent anyway), but I don’t think anything will come of it.

A thing that may cost you more if you want to try something soon is PhysX.Net on codeplex . Do not expect it to just unpack and do the magic. However, I am currently an active author, and the code is both normal C ++ and C ++ / CLI, and yopu might get some help from the author if you want to go into details and maybe use a similar approach for CUDA. For full CUDA speed, you still need to compile your own kernels and then just connect to .NET, so the simpler this part, the happier you will be.

There is a CUDA.NET lib, which should be free, but the page only gives an email address, so expect some lines to be attached, and although the author writes a blog , he doesn't really talk about what's inside lib.

Oh, and if you have a budget, you can give Psi Lambda a look (KappaCUDAnet is part of .NET). Apparently, they are going to raise prices in November (if this is not a sales gimmick :-)

+3
Oct 13 '10 at 12:30
source share

The last thing I knew, most scientific computing was still done at FORTRAN. This is even faster than anything else for linear algebra problems - not Java, not C, not C ++, not C #, not F #. LINPACK is well optimized.

But the observation that your mileage may vary is true for all benchmarks. Claims about dresses (other than mine) are rarely true.

+1
May 02 '10 at 2:21
source share

Firstly, C is much faster than C ++. Therefore, if you need such speed, you should make lib, etc. in c.

As for F #, most control points use Mono, which is up to 2 * slower than the MS CLR, due to the partial use of the GC GC boehm (they have a new GC and LVVM, but they are still immature, do not support generics, etc. .d.)

.NEt languages ​​are compiled into IR (CIL), which are compiled into native code as efficiently as C ++. There is one problem that most GC languages ​​face, and these are large numbers of mutable entries (this includes C ++. NET, as mentioned above). And there is a certain set of scientific tasks that requires this, when necessary, perhaps you should use your own library or use the Flyweight template to reuse objects from the pool (which reduces the number of records). The reason is that in the .NET CLR there is a write barrier where, when updating the reference field (including the field), it will set a bit in the table, saying that this table has been changed. If your code consists of many such entries, it will suffer.

This suggests that a .NET application such as C #, using a lot of static codes, structs and ref / out in structures, can give C as performance, but it is very difficult to encode it or maintain code (for example, C).

Where F # shines, however, is parallelism over immutable data that goes hand in hand with a lot of reading-based problems. It is worth noting that most benchmarks are much higher than in real applications.

As for the floating point, you should use an alternative lib (i.e. .Net) for oCaml because of the slow one. C / C ++ allows faster for lower precision, which oCaml does not do by default.

Finally, I argue that a high-level language such as C #, F # and proper profiling will give you betetr pefromance than c and C ++ for the same developer time. If you change the bottle neck to a c lib pinvoke call, you will also get C as performance for critical areas. However, if you have an unlimited budget and care more about speed, then service, not C, is the way to go (not C ++).

+1
Jun 18
source share



All Articles