Python vs. C ++ for an application that uses sparse linear algebra

I am writing an application where quite a lot of computing time will be devoted to performing the basic operations of linear algebra (addition, multiplication, multiplication by a vector, multiplication by scalar, etc.) by sparse matrices and vectors. Up to this point, we have created a prototype using C ++ and the Boost memory library.

I am considering switching to Python to simplify the coding of the application itself, since it seems that the Boost library (a simple C ++ linear algebra library) is not particularly fast. This is a study / proof of the conceptual application, so some reduction in execution speed is permissible (I believe C ++ almost always surpasses Python), while the encoding time is also significantly reduced.

Basically, I'm looking for general advice from people who used these libraries before. But specifically:

1) I found scipy.sparse and pySparse. Are these (or other libraries) recommended?

2) What libraries outside of Boost are recommended for C ++? I have seen many libraries with C interfaces, but again I am looking for something with low complexity if I can get relatively good performance.

3) Ultimately, will Python be somewhat comparable to C ++ in terms of execution speed for linear algebra operations? I will need to perform many operations with linear algebra, and if the slowdown is significant, I probably will not even try to make this switch.

Thank you in advance for any help and previous experience that you can relate.

+4
source share
5 answers

My advice is to fully test the algorithm in Python before translating it into any other language (otherwise you run the risk of optimizing a prematurely bad algorithm). Once you have clearly defined the best interface for your problems, you can define its external code.

Let me explain.

Suppose your final algorithm is to take a bunch of numbers in a format (row, column, value) and, say, calculate the SVD of the corresponding sparse matrix. Then you can leave the whole interface for Python:

class Problem(object): def __init__(self, values): self.values = values def solve(self): return external_svd(self.values) 

where external_svd is the Python shell for the Fortran / C / C ++ routine that efficiently calculates svd, given the matrix in the format (row, column, value) or any other that floats on your boat.

Again, try using numpy and scipy , as well as any other standard Python tool. Only after you have profiled your code do you have to write the actual external_svd shell.

If you go through this route, you will have a module that is user-friendly (the user interacts with Python, not Fotran / C / C ++) and, most importantly, you can use different ends: external_svd_lapack , external_svd_paradiso , external_svd_gsl , etc. .d. (one for each circuit you select).

For rare linear algebra libraries, check out Intel Math Kernel Library , Rare PARADISO Solver , Harwell Subroutine Library (HSL) called "MA27". I have successfully used them to solve very rare and very big problems (check the IPOPT nonlinear optimization solution page to see what I mean)

+7
source

As llasram says, many libs in python are written in C / C ++, so python should work at an acceptable speed.

In C ++, you can also check gsl (the gnu science library), but I believe the linear algebra routines will be the same as Boost (for these two BLAS libraries). For sparse linear algebra, you should take a look at SBLAS , but I never used it. Here is a brief general pros and cons that I see:

  • C ++:
    • Makes you maintain a well-structured program
    • It can be pretty easily wrapped for high-level languages ​​(like python) to provide quick testing (look at python with api or at swig ).
  • Python:
    • easily debugged, but can easily lead to poorly structured programs.
    • can import test data very easily
    • there are very reliable libraries like scipy / numpy (by the way, scipy also uses BLAS for linear algebra)
    • managed code

I use gsl to manipulate matrices, and I port my C ++ libraries to Python libraries to easily test data. In my opinion, this is a way of combining the advantages of two languages.

+4
source

2) It looks like you are looking for Eigen .

3) I would suggest that if you are doing sparse linear algebra, rather than later, you will need every bit of acceleration you can get, so I just stick with C ++. I see no reason to use Python for this, unless you quickly test the prototype that you already did in C ++.

+4
source

I have no direct application of experience, but scipy / numpy operations scipy almost all implemented in C. Since most of what you need to do is expressed in terms of scipy / numpy , your code should not be much slower than the C / C + equivalent +.

+2
source

Speed ​​is no longer a problem for python since ctypes and cython have appeared. What is different from cython is that you write python code and it generates c code without requiring you to know one line of c and then compiles to a library or you can even create stanalone. Ctypes is also similar, although abit is slower. Of the tests I conducted, cython code is as fast as c code, and that makes sense, since the cython code is translated into c code. Ctypes is slower.

So, in the end, this is a profiling issue, look what is slower in python and move it to cython, or you can wrap your existing c libraries for python using cython. Its quite easy to achieve speed c in this way.

Therefore, I recommend not to waste the effort you put into creating these c libraries, wrap them in cython, and do the rest with python. Or you could do it all with cython if you want, since cython is a python bar with some limitations. And even lets you mix c code. That way you could do part of this in c and part of its python / cython. Depending on what makes you feel more comfortable.

Numpy ans SciPy can also be used to save time and provide ready-to-use solutions for your problems / needs. You should definitely check them out. Numpy even has a weaver tool that allows you to embed c code inside your Python code, just as you can embed assembly code inside your c code. But I think you would rather use cython. Remember that cython is both c and python, which allows you to use c and python libraries directly.

+1
source

All Articles