Java String Analysis for full regex string

I am looking for a tool such as Java String Analysis (JSA) that could sum a string as a regular expression. I tried to do this using JSA, but there I need to find a specific method like StringBuffer.append or other string operations.

I have lines like this:

StringBuilder test=new StringBuilder("hello "); boolean codition=false; if(codition){ test.append("world"); } else{ test.append("other world"); } test.append(" so far"); for(int i=0;i<args.length;i++){ test.append(" again hello"); } // regularExpression = "hello (world| other world) so far( again hello)*" 

And my JSA implementation looks like this:

  public static void main(String[] args) { StringAnalysis.addDirectoryToClassPath("bootstrap.jar"); StringAnalysis.loadClass("org.apache.catalina.loader.Extension"); List<ValueBox> list = StringAnalysis.getArgumentExpressions("<java.lang.StringBuffer: java.lang.StringBuffer append(java.lang.String)>", 0); StringAnalysis sa = new StringAnalysis(list); for (ValueBox e : list) { Automaton a = sa.getAutomaton(e); if (a.isFinite()) { Iterator<String> si = a.getFiniteStrings().iterator(); StringBuilder sb = new StringBuilder(); while (si.hasNext()) { sb.append((String) si.next()); } System.out.println(sb.toString()); } else if (a.complement().isEmpty()) { System.out.println(e.getValue()); } else { System.out.println("common prefix:" + a.getCommonPrefix()); } } } 

I would really appreciate any help with the JSA tool or for a hint with another tool. My biggest regex problem is in the flow of control around a constant string.

+6
source share
2 answers

I do not know a tool that gives you a regular expression out of the box.

But since you have problems with CFG, I would recommend that you write a static analysis adapted to your problem. You can use a static analysis / bytecode mechanism like OPAL (Scala) or Soot (Java). You will find tutorials on every page of the project.

After setting up, you can load the target jar. You should be able to use the program control flow, as shown in the following example:

 1 public static void example(String unknown) { 2 String source = "hello"; 3 if(Math.random() * 20 > 5){ 4 source += "world"; 5 } else { 6 source += "unknown"; 7 } 8 source += unknown; } 

If your analysis detects a String or StringBuilder that is initialized, you can start building your regex. For example, line number two would cause your regular expression to say hello. If you encounter a conditional expression in the control flow of your program, you can parse each path and combine them through "|" later.

Then the branch: "peace" (line 4)
Another branch: "unknown" (line 6)

This can be reduced in line 7 to (the world) | (unknown) and add to the regular expression before the conditional.

If you encounter a variable, you can track it if you are doing interprocedural analysis or you have to use the wildcard operator “. *” Otherwise.

Final regex: "hello ((world) | (unknown)). *"

I hope this leads you to your decision that you want to achieve.

+1
source

Apache Lucene has some tools around finite state machines and regular expressions. In particular, you can use union automata, so I would suggest that you can easily build an automaton that accepts a finite number of words.

0
source

All Articles