Factoring policies in pretty

I do very simple calculations of the probability of getting a subset of X, Y, Z from the set AZ (with the corresponding probabilities x, y, z).

And because of the very heavy formulas to process them, I'm trying to simplify (or collect or multiply - I don't know the exact definition) these polynomial expressions using sympy .

So, having this (a very simple expression for calculating the probability of obtaining a subset of X, Y, Z from the set AZ with the corresponding probabilities x, y, z)

import sympy as sp x, y, z = sp.symbols('xy z') expression = ( x * (1 - x) * y * (1 - x - y) * z + x * (1 - x) * z * (1 - x - z) * y + y * (1 - y) * x * (1 - y - x) * z + y * (1 - y) * z * (1 - y - z) * x + z * (1 - z) * y * (1 - z - y) * x + z * (1 - z) * x * (1 - z - x) * y ) 

I want to get something like this

 x * y * z * (6 * (1 - x - y - z) + (x + y) ** 2 + (y + z) ** 2 + (x + z) ** 2) 

poly rewritten so as to have as few operations as possible ( + , - , * , ** , ...)


I tried factor() , collect() , simplify() . But the result is different from my expectations. I mostly get

 2*x*y*z*(x**2 + x*y + x*z - 3*x + y**2 + y*z - 3*y + z**2 - 3*z + 3) 

I know that sympy can combine polynomials into simple forms:

 sp.factor(x**2 + 2*x*y + y**2) # gives (x + y)**2 

But how to make sympy to combine polynomials from the expressions above?


If this is not possible in sympy, maybe there are other options?

+7
source share
3 answers

Putting together some of these methods will give you a good answer this time. It would be interesting to know if this strategy works more often than on the equations that you generate, or if, as the name suggests, this is just a good result this time.

 def iflfactor(eq): """Return the "I'm feeling lucky" factored form of eq.""" e = Mul(*[horner(e) if e.is_Add else e for e in Mul.make_args(factor_terms(expand(eq)))]) r, e = cse(e) s = [ri[0] for ri in r] e = Mul(*[collect(ei.expand(), s) if ei.is_Add else ei for ei in Mul.make_args(e[0])]).subs(r) return e >>> iflfactor(eq) # using your equation as eq 2*x*y*z*(x**2 + x*y + y**2 + (z - 3)*(x + y + z) + 3) >>> _.count_ops() 15 

By the way, the difference between factor_terms and gcd_terms is that factor_terms will work harder out of general terms, while preserving the original structure of the expression, very similar to what you would do manually (i.e., looking for general terms in additions that might to be pulled out).

 >>> factor_terms(x/(z+z*y)+x/z) x*(1 + 1/(y + 1))/z >>> gcd_terms(x/(z+z*y)+x/z) x*(y*z + 2*z)/(z*(y*z + z)) 

What is it worth

Chris

+4
source

As far as I know, there is no function that does just that. I think this is actually a very difficult problem. See Reduce the number of operations on a simple expression for a discussion on it.

However, there are many simplification features in SymPy that you can try. The one you didn't mention gives a different result: gcd_terms , which decomposes the symbolic gcd without doing decompositions. He gives

 >>> gcd_terms(expression) x*y*z*((-x + 1)*(-x - y + 1) + (-x + 1)*(-x - z + 1) + (-y + 1)*(-x - y + 1) + (-y + 1)*(-y - z + 1) + (-z + 1)*(-x - z + 1) + (-z + 1)*(-y - z + 1)) 

Another useful feature is .count_ops , which counts the number of operations in an expression. for instance

 >>> expression.count_ops() 47 >>> factor(expression).count_ops() 22 >>> e = x * y * z * (6 * (1 - x - y - z) + (x + y) ** 2 + (y + z) ** 2 + (x + z) ** 2) >>> e.count_ops() 18 

(note that e.count_ops() is not the same as you thought yourself, because SymPy automatically allocates 6*(1 - x - y - z) to 6 - 6*x - 6*y - 6*z ).

Other useful features:

  • cse : Performs a general subexpression exception in an expression. Sometimes you can simplify individual parts and then bring them back. It also helps to avoid duplication of calculations.

  • horner : Applies a Horner scheme to a polynomial. This minimizes the number of operations if the polynomial is in the same variable.

  • factor_terms : Similar to gcd_terms . In fact, I do not quite understand what the difference is.

Note that by default simplify will try a few simplifications and return the one that is minimized with count_ops .

+2
source

I had a similar problem, and in the end I implemented my own solution before I came across this. Mine seems to be doing much better by reducing the number of operations. However, I also have a set of brute force styles for all combinations of variables. Thus, runtime grows overexponentially in the number of variables. OTOH, I managed to run it on equations with 7 variables in a very reasonable (but far from real time) amount of time.

It is possible that there are several ways to trim some of the search branches, but I was not worried about this. Further optimization is welcome.

 def collect_best(expr, measure=sympy.count_ops): # This method performs sympy.collect over all permutations of the free variables, and returns the best collection best = expr best_score = measure(expr) perms = itertools.permutations(expr.free_symbols) permlen = np.math.factorial(len(expr.free_symbols)) print(permlen) for i, perm in enumerate(perms): if (permlen > 1000) and not (i%int(permlen/100)): print(i) collected = sympy.collect(expr, perm) if measure(collected) < best_score: best_score = measure(collected) best = collected return best def product(args): arg = next(args) try: return arg*product(args) except: return arg def rcollect_best(expr, measure=sympy.count_ops): # This method performs collect_best recursively on the collected terms best = collect_best(expr, measure) best_score = measure(best) if expr == best: return best if isinstance(best, sympy.Mul): return product(map(rcollect_best, best.args)) if isinstance(best, sympy.Add): return sum(map(rcollect_best, best.args)) 

To illustrate performance, this document (paywalled, sorry) has 7 formulas that are 5th degree polynomials in 7 variables with a maximum of 29 terms and 158 operations in extended forms. After applying rcollect_best and @smichr iflfactor number of operations in 7 formulas will be as follows:

 [6, 15, 100, 68, 39, 13, 2] 

and

 [32, 37, 113, 73, 40, 15, 2] 

respectively. iflfactor has 433% more operations than rcollect_best for one of the formulas. In addition, the number of operations in advanced formulas:

 [39, 49, 158, 136, 79, 27, 2] 
0
source

All Articles