I need to speed up the function. Should I use cython, ctypes or something else?

I am very interested in learning Python by writing a type of application for genetic programming.

I had great tips from Torsten Marek, Paul Hankin and Alex Martelli on this site.

The program has 4 main functions:

  • generate (arbitrary) expression tree.
  • evaluate the suitability of the tree
  • mestizo
  • mutate

Like all generation, crossbreeding and mutation, call "evaluate fitness". This is the busiest feature and is the main speed bottleneck.

Like the nature of genetic algorithms, he has to look for a huge space of solutions, so the sooner the better. I want to speed up each of these features. I'll start with a fitness appraiser. My question is the best way to do this. I studied cython, ctypes and 'linking and embedding'. They are all new to me and completely outside me at the moment, but I look forward to learning one and, ultimately, all.

The fitness function should compare the value of the expression tree with the value of the target expression. Thus, it will consist of a postfix evaluator that will read the tree in postfix order. I have all the code in python.

I need advice on which I should learn and use now: cython, ctypes or linking and embedding.

Thanks.

+7
python cython ctypes
source share
4 answers

Ignore the answer of everyone else for now. The first thing you should learn to use is a profiler. Python comes with the / cProfile profile; You must learn to read the results and analyze where the real bottlenecks are. The optimization task consists of three parts: reduce the time spent on each call, reduce the number of calls made and reduce memory usage to reduce disk overload.

The first goal is relatively simple. The profiler will show you the most time-consuming functions, and you can immediately go to this function to optimize it.

The second and third goal is more difficult, because it means that you need to change the algorithm to reduce the need to make so many calls. Find functions that have a large number of calls, and try to find ways to reduce the need for their calls. Use the built-in collections, they are very well optimized.

If you do a lot of processing numbers and arrays, you should take a look at pandas, Numpy / Scipy, gmpy third-party modules; they are well optimized by C libraries for processing arrays / tabular data.

Another thing you want to try is PyPy. PyPy can JIT recompile and perform much more complex optimizations than CPython, and it will work without the need to change Python code. Although well-optimized CPython-oriented code can be very different from well-optimized PyPy-oriented code.

The next one to try is Cython. Cython is a slightly different language than Python, in fact Cython is best described as C with typed Python-like syntax.

For parts of your code that are in very narrow loops that you can no longer optimize using any other methods, you can rewrite it as a C. extension. Python has very good support for an extension with C. In PyPy, the best way to extend PyPy is cffi.

+15
source share

Cython is the fastest to do this job by writing your algorithm directly in Cython or writing it in C and linking it with python with Cython.

My advice: learn Kyoto.

+3
source share

Another great option is boost :: python, which makes it easy to port C or C ++.

Of these features, though, since you have already written python code, cython is probably good to try first. You may not have to rewrite the code to get acceleration.

0
source share

Try to work with your fitness function so that it supports alerts. This will replace all calls duplicating previous calls, with a quick dict lookup.

0
source share

All Articles