Math Simplification Strategies

I have a well-formed tree, which is a mathematical expression. For example, given the line: "1+2-3*4/5" , this is analyzed for:

 subtract(add(1,2),divide(multiply(3,4),5)) 

What is expressed as this tree:

"1 + 2-3 * 4/5"

What I would like to do is take this tree and reduce it as much as possible. In the example above, this is pretty simple because all numbers are constants. However, things get more complicated if I resolve the unknowns (indicated by the $ symbol followed by an identifier):

"3*$a/$a" becomes divide(multiply(3,$a), $a)

This should simplify to 3 , as the terms $a should cancel each other out. The question is, "how do I know about this in a general way?" How to find out that min(3, sin($x)) will always be sin($x) ? How to find out that sqrt(pow($a, 2)) is abs($a) ? How to find out that nthroot(pow(42, $a), $a) (the root of a th of 42 due to a th ) is 42 ?

I understand that this question is quite broad, but for some time I hit my head about it and did not come up with anything satisfactory.

+58
math algorithm simplify
Sep 24 '11 at 16:23
source share
6 answers

You probably want to implement a term rewriting system. For basic math, check out WikiPedia .

Structure of the term rewrite module

Since I recently implemented a solution ...

  • First prepare the CExpression class, which models the structure of your expression.

  • Inject a CRule that contains a template and a replacement. Use special characters as template variables that must be bound during pattern matching and replaced in the replacement expression.

  • Then we implement the CRule class. The main applyRule(CExpression, CRule) method applyRule(CExpression, CRule) tries to match the rule with any applicable subexpression of the expression. If it matches, return the result.

  • To complete, implement the CRuleSet class, which is just a collection of CRule objects. The basic reduce(CExpression) method applies a set of rules until more rules can be applied, and then the reduced expression is returned.

  • In addition, you need a CBindingEnvironment class that matches already matched characters with matched values.

Try rewriting the expression in normal form

Do not forget that this approach works until a certain point, but is likely to be incomplete. This is because all of the following rules perform local overwrites.

To make this local rewrite logic stronger, try to transform the expressions into what I would call a normal form. This is my approach:

  • If the term contains literal meanings, try translating the term as far to the right as possible.

  • In the end, this literal meaning can be displayed as right as possible and can be evaluated as part of a fully literal expression.

When to evaluate a fully literal expression

An interesting question is when to evaluate a fully literal expression. Suppose you have an expression

  x * ( 1 / 3 ) 

which would decrease to

  x * 0.333333333333333333 

Now suppose x is replaced by 3. This will give something like

  0.999999999999999999999999 

Thus, the expected score returns a slightly incorrect value.

On the other hand, if you hold (1/3) and first replace x with 3

  3 * ( 1 / 3 ) 

rewrite rule would give

  1 

Thus, it may be useful to evaluate a completely literal expression at the end.

Sample rewrite rules

This is how my rules appear inside the application: The characters _1, _2, ... match any subexpression:

 addRule( new TARuleFromString( '0+_1', // left hand side :: pattern '_1' // right hand side :: replacement ) ); 

or a little harder

 addRule( new TARuleFromString( '_1+_2*_1', '(1+_2)*_1' ) ); 

Some special characters correspond only to special subexpressions. For example. _Literal1, _Literal2, ... correspond only to literal values:

 addRule( new TARuleFromString( 'exp(_Literal1) * exp(_Literal2 )', 'exp( _Literal1 + _Literal2 )' ) ); 

This rule moves a non-literal expression to the left:

 addRule( new TARuleFromString( '_Literal*_NonLiteral', '_NonLiteral*_Literal' ) ); 

Any name starting with the character '_' is a template variable. Although the system complies with the rule, it stores a stack of assignments of already matched characters.

Finally, do not forget that rules can give endless replacement sequences. Thus, reducing the expression, remember the process, which intermediate expressions have already been achieved earlier.

In my implementation, I do not store intermediate expressions directly. I keep an array of MD5 () hashes of the intermediate expression.

A set of rules as a starting point

Here are the rules to get you started:

  addRule( new TARuleFromString( '0+_1', '_1' ) ); addRule( new TARuleFromString( '_Literal2=0-_1', '_1=0-_Literal2' ) ); addRule( new TARuleFromString( '_1+0', '_1' ) ); addRule( new TARuleFromString( '1*_1', '_1' ) ); addRule( new TARuleFromString( '_1*1', '_1' ) ); addRule( new TARuleFromString( '_1+_1', '2*_1' ) ); addRule( new TARuleFromString( '_1-_1', '0' ) ); addRule( new TARuleFromString( '_1/_1', '1' ) ); // Rate = (pow((EndValue / BeginValue), (1 / (EndYear - BeginYear)))-1) * 100 addRule( new TARuleFromString( 'exp(_Literal1) * exp(_Literal2 )', 'exp( _Literal1 + _Literal2 )' ) ); addRule( new TARuleFromString( 'exp( 0 )', '1' ) ); addRule( new TARuleFromString( 'pow(_Literal1,_1) * pow(_Literal2,_1)', 'pow(_Literal1 * _Literal2,_1)' ) ); addRule( new TARuleFromString( 'pow( _1, 0 )', '1' ) ); addRule( new TARuleFromString( 'pow( _1, 1 )', '_1' ) ); addRule( new TARuleFromString( 'pow( _1, -1 )', '1/_1' ) ); addRule( new TARuleFromString( 'pow( pow( _1, _Literal1 ), _Literal2 )', 'pow( _1, _Literal1 * _Literal2 )' ) ); // addRule( new TARuleFromString( 'pow( _Literal1, _1 )', 'ln(_1) / ln(_Literal1)' ) ); addRule( new TARuleFromString( '_literal1 = pow( _Literal2, _1 )', '_1 = ln(_literal1) / ln(_Literal2)' ) ); addRule( new TARuleFromString( 'pow( _Literal2, _1 ) = _literal1 ', '_1 = ln(_literal1) / ln(_Literal2)' ) ); addRule( new TARuleFromString( 'pow( _1, _Literal2 ) = _literal1 ', 'pow( _literal1, 1 / _Literal2 ) = _1' ) ); addRule( new TARuleFromString( 'pow( 1, _1 )', '1' ) ); addRule( new TARuleFromString( '_1 * _1 = _literal', '_1 = sqrt( _literal )' ) ); addRule( new TARuleFromString( 'sqrt( _literal * _1 )', 'sqrt( _literal ) * sqrt( _1 )' ) ); addRule( new TARuleFromString( 'ln( _Literal1 * _2 )', 'ln( _Literal1 ) + ln( _2 )' ) ); addRule( new TARuleFromString( 'ln( _1 * _Literal2 )', 'ln( _Literal2 ) + ln( _1 )' ) ); addRule( new TARuleFromString( 'log2( _Literal1 * _2 )', 'log2( _Literal1 ) + log2( _2 )' ) ); addRule( new TARuleFromString( 'log2( _1 * _Literal2 )', 'log2( _Literal2 ) + log2( _1 )' ) ); addRule( new TARuleFromString( 'log10( _Literal1 * _2 )', 'log10( _Literal1 ) + log10( _2 )' ) ); addRule( new TARuleFromString( 'log10( _1 * _Literal2 )', 'log10( _Literal2 ) + log10( _1 )' ) ); addRule( new TARuleFromString( 'ln( _Literal1 / _2 )', 'ln( _Literal1 ) - ln( _2 )' ) ); addRule( new TARuleFromString( 'ln( _1 / _Literal2 )', 'ln( _Literal2 ) - ln( _1 )' ) ); addRule( new TARuleFromString( 'log2( _Literal1 / _2 )', 'log2( _Literal1 ) - log2( _2 )' ) ); addRule( new TARuleFromString( 'log2( _1 / _Literal2 )', 'log2( _Literal2 ) - log2( _1 )' ) ); addRule( new TARuleFromString( 'log10( _Literal1 / _2 )', 'log10( _Literal1 ) - log10( _2 )' ) ); addRule( new TARuleFromString( 'log10( _1 / _Literal2 )', 'log10( _Literal2 ) - log10( _1 )' ) ); addRule( new TARuleFromString( '_Literal1 = _NonLiteral + _Literal2', '_Literal1 - _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_Literal1 = _NonLiteral * _Literal2', '_Literal1 / _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_Literal1 = _NonLiteral / _Literal2', '_Literal1 * _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_Literal1 =_NonLiteral - _Literal2', '_Literal1 + _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_NonLiteral + _Literal2 = _Literal1 ', '_Literal1 - _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_NonLiteral * _Literal2 = _Literal1 ', '_Literal1 / _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_NonLiteral / _Literal2 = _Literal1 ', '_Literal1 * _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_NonLiteral - _Literal2 = _Literal1', '_Literal1 + _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_NonLiteral - _Literal2 = _Literal1 ', '_Literal1 + _Literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_Literal2 - _NonLiteral = _Literal1 ', '_Literal2 - _Literal1 = _NonLiteral' ) ); addRule( new TARuleFromString( '_Literal1 = sin( _NonLiteral )', 'asin( _Literal1 ) = _NonLiteral' ) ); addRule( new TARuleFromString( '_Literal1 = cos( _NonLiteral )', 'acos( _Literal1 ) = _NonLiteral' ) ); addRule( new TARuleFromString( '_Literal1 = tan( _NonLiteral )', 'atan( _Literal1 ) = _NonLiteral' ) ); addRule( new TARuleFromString( '_Literal1 = ln( _1 )', 'exp( _Literal1 ) = _1' ) ); addRule( new TARuleFromString( 'ln( _1 ) = _Literal1', 'exp( _Literal1 ) = _1' ) ); addRule( new TARuleFromString( '_Literal1 = _NonLiteral', '_NonLiteral = _Literal1' ) ); addRule( new TARuleFromString( '( _Literal1 / _2 ) = _Literal2', '_Literal1 / _Literal2 = _2 ' ) ); addRule( new TARuleFromString( '_Literal*_NonLiteral', '_NonLiteral*_Literal' ) ); addRule( new TARuleFromString( '_Literal+_NonLiteral', '_NonLiteral+_Literal' ) ); addRule( new TARuleFromString( '_Literal1+(_Literal2+_NonLiteral)', '_NonLiteral+(_Literal1+_Literal2)' ) ); addRule( new TARuleFromString( '_Literal1+(_Literal2+_1)', '_1+(_Literal1+_Literal2)' ) ); addRule( new TARuleFromString( '(_1*_2)+(_3*_2)', '(_1+_3)*_2' ) ); addRule( new TARuleFromString( '(_2*_1)+(_2*_3)', '(_1+_3)*_2' ) ); addRule( new TARuleFromString( '(_2*_1)+(_3*_2)', '(_1+_3)*_2' ) ); addRule( new TARuleFromString( '(_1*_2)+(_2*_3)', '(_1+_3)*_2' ) ); addRule( new TARuleFromString( '(_Literal * _1 ) / _Literal', '_1' ) ); addRule( new TARuleFromString( '(_Literal1 * _1 ) / _Literal2', '(_Literal1 * _Literal2 ) / _1' ) ); addRule( new TARuleFromString( '(_1+_2)+_3', '_1+(_2+_3)' ) ); addRule( new TARuleFromString( '(_1*_2)*_3', '_1*(_2*_3)' ) ); addRule( new TARuleFromString( '_1+(_1+_2)', '(2*_1)+_2' ) ); addRule( new TARuleFromString( '_1+_2*_1', '(1+_2)*_1' ) ); addRule( new TARuleFromString( '_literal1 * _NonLiteral = _literal2', '_literal2 / _literal1 = _NonLiteral' ) ); addRule( new TARuleFromString( '_literal1 + _NonLiteral = _literal2', '_literal2 - _literal1 = _NonLiteral' ) ); addRule( new TARuleFromString( '_literal1 - _NonLiteral = _literal2', '_literal1 - _literal2 = _NonLiteral' ) ); addRule( new TARuleFromString( '_literal1 / _NonLiteral = _literal2', '_literal1 * _literal2 = _NonLiteral' ) ); 

Make rules top-notch

An interesting point: since the above rules are a special expression that is correctly evaluated by the expression parser, users can even add new rules and thereby improve the rewriting capabilities of the application.

Parsing expressions (or more general ones: languages)

For Cocoa / OBjC applications , Dave DeLong DDMathParser is an ideal candidate for parsing mathematical expressions.

For other languages, our old friends Lex and Yacc or the newer GNU Bison can help.

A younger and richer set of ready-to-use syntax files , ANTLR is a modern Java-based parser generator. In addition to the pure command line, ANTLRWorks provides a GUI interface for creating and debugging ANTLR-based parsers. ANTLR generates grammars for various host languages , such as JAVA, C, Python, PHP, or C # . ActionScript runtime is currently disabled .

If you want to learn how to parse expressions (or languages ​​in general) from the bottom up, I would suggest this free text book from Niklaus Wirth (or the German book version ), the famous inventor of Pascal and Modula-2.

+76
Sep 24 2018-11-22T00:
source share
β€” -

This task can become quite difficult (in addition to the simplest transformation). Essentially, this is what algebra software does all the time.

You can find a readable introduction of how this is done (rule based assessment), for example. for Mathematica .

+10
Sep 24 '11 at 16:35
source share

You want to create a CAS (Computational Algebra System), and the topic is so broad that there is a whole area of ​​research devoted to this. This means that there are several books that are likely to answer your question better than SO.

I know that some systems build trees that first reduce constants, and then put the tree in a normalized form, and then use a large database of validated / known formulas to convert the problem to another form.

+7
Sep 24 '11 at 16:41
source share

I believe that you need to "sort out the strength" of such trees.

You will need to formulate a couple of rules that describe possible simplifications. Then you can walk around the tree and look for suitable rules. Since some simplifications can lead to simpler results than others, you will have to do the same as for the shortest route in the metro plan: try all the possibilities and sort the results by some quality criteria.

Since the number of such scenarios is limited, you can easily find simplification rules by trying all combinations of operators and variables and again getting a genetic algorithm that checks that the rule has not been found before and that it actually simplifies the input.

Multiplications can be represented as additions, so one rule can be that a - a cancels itself: 2a-a = a + aa

Another rule would be to first multiply all divisions, because these are fractions. Example:

1/2 + 3/4 Open all divisions, and then multiply each fraction by a divisor on both sides of all other fractions

4/8 + 6/8 Then all the elements have the same divisor, and therefore unified (4 + 6) / 8 = 10/8

Finally, you will find the highest common divisor between upper and lower 5/4

In relation to your tree, the strategy will be to work from the bottom leaves up, simplifying each multiplication, first converting it to additions. Then, simplifying each addition as fractions

All this time, you will check your shortcut rules again to find such simplifications. To know that the rule applies, you probably have to either try all permutations of the subtree. For example. Rule aa will also apply for -a + a. Maybe a + ba.

Just some thoughts, hope you get some ideas ...

+1
Sep 24 '11 at 17:44
source share

In fact, you cannot do this at all, because although they are mathematically the same, the computer arithmetic may not match. For example, -MAX_INT is undefined, therefore -% a = / =% a. Similarly for floats, you need to deal with NaN and Inf accordingly.

0
Sep 24 '11 at 16:35
source share

My naive approach would be to have some data structure with inversions of each function (i.e. divide and multiply ). You obviously need additional logic to make sure that they are actually inverted, since multiplying by 3 and then dividing by 4 is not really the opposite.

Although this is primitive, I think this is a worthy first pass in the problem and will solve many of the cases that you noted in your question.

I look forward to your complete decision and fear in fear - this is a mathematical brilliance :)

0
Sep 24 '11 at 17:29
source share



All Articles