C # Static analysis, possible values ​​for a variable / parameter

In code similar to each of the following examples, I would like to be able to statically analyze the code to determine the list of possible values ​​that are passed to SpecialFunction ().

SpecialFunction(5); // A int x = 5; SpecialFunction(x); // B int x = 5; x = condition ? 3 : 19; SpecialFunction(x); // C 

I can already parse C # in an abstract syntax tree, and I can already handle cases like A, and I think I could track the initial assignments of values ​​to guess case B, but cases that are as easy as C seem complicated fast.

I am almost sure that we will not be able to solve for x in all cases statically and this is normal. I would like to know the strategies for his attempt, ways of recognizing when this is not possible. What if we need to enable class level fields and multithreading? Shutters? It would help if we know that for a set X all possible values ​​for X , |X| < 50 |X| < 50 ?

From Vladimir Perevalov’s suggestion, how can Pex's concepts be used to find possible values ​​for target code points (rather than what Pex seems to do to detect codes and code values ​​that lead to uncontrolled (?) Exceptional cases)?

+8
c # static-analysis roslyn pex
source share
2 answers

There is a project that does what you want (at least very close). This is Pex . Try to look at their documents, you can also decompile sources and see what they do.

+3
source share

What you want is a global analysis of the data flow (“what assignments / side effects by value reach the points of use”) [which requires analysis of the control flow as a predecessor] and some kind of range analysis (“summing the set of values ​​that can reach the point”) .

Computing a data stream requires a complete C # interface, local control and analysis of the data stream, and then stitching these answers together into a global data stream analysis.

Range analysis requires that you first determine how you intend to code the many possible values; What specification system is allowed? The simplest, simple set of values, as a rule, explodes. The intermediate specification scheme will be similar to an OP-simply connected constant, for example, "x <50". The problem with such a limited scheme is that the richness of the set of values ​​can make you get useless answers, especially if there are other predicates of interest (if x is always odd, a simply connected constant can only simulate this as "x <infinity", which is clearly not Thus, you want to choose a specification scheme that is complex enough to model the types of values ​​you are interested in. However, as your specification scheme becomes more complex, the facts are correct. zhnyayutsya, so you can not make it too complicated.

Basically, the available analysis tools do not have such analyzes, not to mention you. PEX may indeed have such a mechanism; if you are lucky, it is also exhibited.

Our DMS Software Reengineering Toolkit has general parsing, character table building, control and data stream analysis and even a range analysis tool (specification: x <k1 * a + k2 * b where k1 and k2 are constants, a and b are other software variables visible where x is consumed). DMS has C #, Java, GNU C, and COBOL interfaces, and we actually created this mechanism for GNU C and IBM Enterprise COBOL (and partially for Java 7) by collecting (static analysis!) Facts specific to these languages ​​and presenting these facts for a common mechanism. We have not created this mechanism for C # yet. But if you cannot get a good answer from another source, this is probably pretty close.

+3
source share

All Articles