Java regex performance issue

I am trying to make a program to graphically display functions in Java, and it includes user input for a function that will be displayed, parsed, and graphically displayed. For example, a user can enter x ^ 2 - y ^ 2, cos (x + y), log (x) - sqrt (y), etc. The program uses both infix binary operations (+, -, etc.) and unary operations (cos, sqrt, etc.).

In short, to evaluate unary operations, I have to make sure that this expression follows the format of a single unary operation. For example, cos (x), sqrt (x + y) and log (exp (y) - x) correspond to this format, since they are unary operations with some expression as their operand; however, strings such as sin (x) * cos (y) and 1 + log (x) do not conform to this format. To check, I made a regex for this format:

String unaryName = "((productlog)|(zeta)|(log)|(sqrt)|(cos)|(sin)|(tan)|(sec)|(csc)|(csc)|(abs)|(arccos)|(arcsin)|(arctan)|(arcsec)|(arccsc)|(arccot)|(gamma)|(exp))"; 

(this is just a regular expression to check if a given string is a name for a predefined unary operation)

 String unaryOperation = unaryName + "\\(([^\\(\\)]*(\\(.*\\))*[^\\(\\)]*)+\\)" 

I will give an explanation. This regular expression searches for the name of one of the unary operations. After that, he searches for the left bracket. After that, he searches for a sequence of characters that are not brackets, and then a sequence that begins with the left parenthesis and ends with the right bracket. The latter prevents a string from matching, such as "sin (x) + cos (y)").

This regular expression always gives the desired results, as far as I can tell. However, using it raises one problem. Consider this situation:

 String s = "cos(3) + sin(4)"; System.out.println(s.matches(unaryOperation)); 

Obviously, if the regex works, it should return false, which does. The same can be said in this example:

 String s = "cos(3.000) + sin(4)"; System.out.println(s.matches(unaryOperation)); 

Nothing has changed, in different ways. However, adding zeros to 3 sequentially, the match seems to take an exponentially longer time to evaluate. For me, 12 zeros take about 13 seconds. Since my program will display many points on the graph, it will have to calculate thousands of expressions every time it draws something, so this is a fatal flaw.

I already found a way to use this regular expression, and my program works very well, but I still would like to know: why does this regular expression take so long to work on large inputs, and is there any way to change the regular expression, to fix this problem?

+6
source share
2 answers

You can use this regex

 unaryName+"\\([^)]*(\\([^()]*\\))?[^(]*\\)" ------------ |->starting from center. 

Here I check the correctness of <parentheses . This should solve your problem!

+1
source

I suspect the problem is that your expression makes a lot of digressions due to .* In the middle of the pattern. Try replacing it with a reluctant quantifier:. .*? or, even better (if I understand the logic), [^\\)]* .

Actually, isn't that a trick:

 String unaryOperation = unaryName + "\\([^\\)]*\\)"; 

It searches for a name, a left parenthesis, any number of characters with an incorrect bracket, and then a right parenthesis. This suggests that you do not want to match things like

 "cos(3 * (4 + x))" 

(which will also not match your pattern).

0
source

All Articles