Stack Based Expression Evaluation Efficiency for Mathematical Analysis

Question

Stack Based Expression Evaluation Efficiency for Mathematical Analysis

For academic purposes, I have to write an application that displays custom expressions like: f (x) = 1 - exp (3 ^ (5 * ln (cosx)) + x)

The approach I chose to write the parser is to convert the expression to an RPN with the Shunting Yard algorithm, considering primitive functions such as "cos" as unary operators. This means that the function recorded above will be converted into a series of tokens of the type:

1, x, cos, ln, 5, *,3, ^, exp, -

The problem is that in order to build a function, I have to evaluate it several times, so using the stack estimation algorithm for each input value will be very inefficient. How can i solve this? Should I forget the idea of RPN?

+7

performance math parsing shunting-yard rpn

Dave danuve Feb 10 '10 at 19:39

source share

9 answers

Aryabhatta · Answer 1 · 2010-02-10T21:07:10+0000

How much is "many times"? Million?

What features can be introduced? Can we assume that they are continuous?

Have you tried to measure how well your code works?

(Sorry, it started with questions!)

You can try one of two approaches (or both) described below (perhaps many more):

1) Separate the trees.

You can create a parse tree. Then do what most compilers do to optimize expressions, constantly bend, eliminate common subexpression (which you could achieve by linking common clipping expressions and caching the result), etc.

Then you can use lazy valuation methods to avoid whole subtrees. For example, if you have a tree

  * / \ AB

where A is rated at 0, you can completely avoid rating B, as you know, the result is 0. With RPN, you lose on a lazy rating.

2) Interpolation

Assuming your function is continuous, you can approach your function with a high degree of accuracy using Polynomial interpolation . Thus, you can perform a complex calculation of the function several times (depending on the degree of polynomial you have chosen), and then perform fast polynomial calculations for the remaining time.

To create an initial dataset, you can simply use approach 1 or just stick to your RPN, since you will only generate a few values.

So, if you use Interpolation, you can save your RPN ...

Hope this helps!

Michael anderson · Answer 2 · 2010-02-10T22:37:28+0000

Why reinvent the wheel? Use a fast scripting language instead. Integrating something like lua into your code will take very little time and will be very fast.

Usually you can compile your expression byte, and this should lead to the fact that the code will work very fast, of course, fast enough for simple 1D-graphs.

I recommend lua as fast and integrated with C / C ++ as easier than any other scripting language. Other good options would be python, but although it was better known, it was hard for me to integrate.

Jason s · Answer 3 · 2010-02-10T19:56:48+0000

Why not keep parsing around the tree (I use the "tree" freely, in your case it is a sequence of operations) and label input variables accordingly? (for example, for inputs x, y, z, etc. annotate "x" with 0 to denote the first input variable "y" with 1 to denote the second input variable, etc.)

Thus, you can parse the expression once, save the parse tree, take an array of input data and apply the analysis tree for evaluation.

If you are worried about the performance aspects of the evaluation step (versus the parsing step), I don’t think you would have done much better if you didn’t start vectorizing (applying your parsing tree on the input vector right away) or hard-coding operations into a fixed function.

anon · Answer 4 · 2010-02-10T22:45:36+0000

I use a bypass algorithm to create an RPN. Then I will “compile” the RPN into a tokenized form, which can be executed (interpreted) repeatedly without parsing the expression again.

lhf · Answer 5 · 2010-02-20T00:11:35+0000

Michael Anderson proposed Lua . If you want to try Lua just for this task, see My ae library.

Paul · Answer 6 · 2010-02-10T20:11:01+0000

Ineffective in what sense? There are machine time and programmer time. Is there a standard for how fast it should work with a certain level of difficulty? Is it more important to complete the task and move on to the next (perfectionists sometimes don’t finish)?

All of these steps must be performed for each input value. Yes, you may have a heuristic that scans the list of operations and clears it a bit. Yes, you can compile some of them for assembly instead of calling +, *, etc. Like high level features. You can compare vectorization (doing all +, and then all *, etc. With a vector of values) to complete the whole procedure one value at a time. But do you need?

I mean, what do you think if you plan a function in gnuplot or Mathematica?

Mike dunlavey · Answer 7 · 2010-02-12T02:20:29+0000

Your simple interpretation of RPN should work fine, especially since it contains

Library math functions like cos , exp and ^ (pow, including logs)
Character Table Search

Let's hope your character table (with variables like x in it) is short and simple.

The library functions are likely to be your biggest time users, so if your interpreter is poorly written, this will not be a problem.

If, however, you really need to go for speed, you can translate the expression into C code, compile and link it to the dll on the fly and load it (takes about a second). This, plus memoized versions of mathematical functions, can give you better performance.

PS For syntax, your syntax is pretty vanilla, so a simple parser with recursive descents (about the code page, O (n), the same as the shunting yard) should work fine. In fact, you can simply calculate the result of the parsing (if the math functions take up most of the time) and not worry about the parsing trees, RPN, any of these materials.

Kaouni · Answer 8 · 2010-10-29T18:37:10+0000

I think this RPN-based library might serve the purpose: http://expressionoasis.vedantatree.com/

I used it with one of my calculator projects, and it works well. It is small and simple, but expandable.

BCS · Answer 9 · 2010-10-30T02:03:59+0000

One optimization would be to replace the stack with an array of values and implement the evaluator as three address mechanics , where each operation is loaded from two (or one) locations and saved to the third. This can lead to very hard code:

 struct Op { enum { add, sub, mul, div, cos, sin, tan, //.... } op; int a, b, d; } void go(Op* ops, int n, float* v) { for(int i = 0; i < n; i++) { switch(ops[i].op) { case add: v[op[i].d] = v[op[i].a] + v[op[i].b]; break; case sub: v[op[i].d] = v[op[i].a] - v[op[i].b]; break; case mul: v[op[i].d] = v[op[i].a] * v[op[i].b]; break; case div: v[op[i].d] = v[op[i].a] / v[op[i].b]; break; //... } } }

Converting from RPN to 3-address should be simple, since 3-address is a generalization.

Stack Based Expression Evaluation Efficiency for Mathematical Analysis

1) Separate the trees.

2) Interpolation

More articles: