Expression Tree Dependency Analyzer

Question

Expression Tree Dependency Analyzer

I am creating an expression tree dependency analyzer for a cross-data source IQueryProvider.

That is, I have IQueryable with some elements that can be executed locally in memory against any arbitrary provider (e.g. Entity Framework). Some other elements in IQueryable go against the object I need to make a remote WCF call. A WCF operation takes a serialized expression tree, deserializes it, executes a LINQ query in its own local data store (also lets say Entity Framework), and then returns the results to me (although this mechanism can also be easily provided by WCF Data Services DataServiceQuery ... but I I don’t use it because the level of functional support is limited ... at best). As soon as I get the results from the WCF service, I will execute the LINQ query result in memory on the locally executed LINQ query.

So what's hard with that? Well, I need to define expression tree dependencies so that my local base query provider does not explode, trying to execute my LINQ query, which has components that can only be run on a remote WCF service ... and vice versa.

Take a simple scenario:

var result = (from entityX in new Query<MyEntityX>() from entityY in new Query<MyEntityY>() where entityX.SomeProperty == "Hello" && entityY.SomeOtherProperty == "Hello 2" && entityX.Id == entityY.XId).ToList();

Query<T> is a simple requested shell with my own provider, which has the ability to parse the tree to figure out what to do before changing roots using another query provider. So, in the above case, I need:

Run the query against MyEntityA using the context of the local object, and apply only the criteria myEntityX.SomeProperty == "Hello" . That is, follow these local steps:
// assume the functionality for replacing new Query<MyEntityA> with new // ObjectContext<MyEntityA>() is already there... var resultX = (from entityX in new Query<MyEntityX>() where entityX.SomeProperty == "Hello").ToList().AsQueryable();
Submit the following serialization and run it in my remote WCF service, then return the results.
// Send the preceeding expression over the over the wire // and get the results back (just take my word this already works) var resultY = (from entityY in new Query<MyEntityY>() where entityY.SomeOtherProperty == "Hello 2").ToList().AsQueryable();
Perform the following in memory:
var finalResult = (from entityX in resultX from entityY in resultY where entityX.SomeProperty == "Hello" && entityY.SomeOtherProperty == "Hello 2" && entityX.Id == entityY.XId).ToList();

Please note that the solution should include a way to accumulate criteria, which are also indicated for forecasts ... for example

 var result = (from i in (from entityX in new Query<MyEntityX>() from entityY in new Query<MyEntityY>() select new { PropX = entityX, PropY = entityY }) where i.PropX.SomeProperty == "Hello" && i.PropY.SomeOtherProperty == "Hello 2" && i.PropX.Id == i.PropY.XId select i) .ToList();

This should result in the same two separate requests that were above being issued before the others are evaluated in memory. On an unrelated note, I think I probably used PLINQ and DRYAD to work in memory operations with improved performance.

So, I have some ideas (for example, to make some passes over a tree with a visitor and accumulate candidates for a given type of entity), but I am looking for suggestions from some other people on how to accumulate parts of my expression tree that can be executed against a given context. ..that is, knowing that the one where the criteria apply to one basic new Query<T> , and the other criterion applies to another ... so that I can figure out what I can do against the data save 1, that I can to do against data warehouse 2 and what I need o make in memory, and accordingly execute different parts of the tree. This sounds like a Funcletizer, but a bit more complicated ...

Thanks for any help.

+6

c # lambda linq expression-trees

Jeff Sep 26 '10 at 6:25

source share

1 answer

Ondrej Tucny · Accepted Answer · 2010-10-11T22:07:03+0000

This is an interesting problem. I would consider a multi-step approach:

analysis of the expression (probably from the bottom up) of the expression tree and marking the nodes as "remote", "local" and "neutral"
top remote sub-expression identification
remote query execution (subexpression exception)
Local query execution

The following is more detailed information for each phase. The “Notes” section at the end of my answer contains some important points to keep in mind.

Disclaimer: My answer is rather sketchy, and I am sure that it does not cover many aspects and cases that may occur regarding the semantics of the individual operations allowed in the expression tree. I think some compromises should be made to make the implementation simple enough.

Stage 1: Expression Analysis and Labeling

Each node in the expression tree can be considered to belong to the following categories:

"remote" nodes correspond to operations that must be performed remotely;
"local" nodes correspond to operations that must be performed locally;
"neutral" nodes correspond to operations that can be performed by any query processor.

The bottom-up approach for processing and processing the expression tree seems appropriate for this case. The reason is that when processing a given node X that has subnodes from Y_1 to Y_n, the category of node X largely depends on the categories of its trays Y_1 to T_n.

Rewrite the sample you provided:

 entityX.SomeProperty == "Hello" && entityY.SomeOtherProperty == "Hello 2" && entityX.Id == entityY.Id

to the outline of the corresponding expression tree:

 &&(&&(==(Member(SomeProperty, Var(entityX)), "Hello"), ==(Member(SomeOtherProperty, Var(entityY)), "Hello 2")), ==(Member(Id, Var(entityX)), Member(Id, Var(entityY)))

This expression tree will be marked from bottom to top. R for "remote", L for "local", N for "neutral". If entityX removed and entityY is local, the result will look like this:

 L:&&(L:&&(R:==(R:Member(SomeProperty, R:Var(entityX)), N:"Hello"), L:==(L:Member(SomeOtherProperty, L:Var(entityY)), N:"Hello 2")), L:==(R:Member(Id, R:Var(entityX)), L:Member(Id, L:Var(entityY)))

As you can see, for each operator, your analyzer will have to determine a category based on the categories of its subnode. In the above example:

accessing the object of the object will lead to the same category as the object;
the string literal will be neutral;
comparing the equality of local and remote subexpressions will have a local category;
the operator will again support local remoteness.

However, you might consider combining a bottom-up approach with top-down optimization to get better results. Consider this (symbolic): R == R + L How do you want to perform an equality comparison? With a clean upstream approach, you execute it locally. However, in some situations, it would be better to pre-compute L locally, replace the subexpression with the actual value (i.e., Neutral) and remotely perform an equality comparison. In other words, you can complete the implementation of the query plan optimizer.

Step 2: Identify Deleted Subexpressions

In the next phase, the tagged expression tree will be processed from top to bottom and each subexpression marked as deleted, deduced from the tree and enrolled in the set of expressions remotely processed for each element in the remote dataset.

From the foregoing, it is clear that some deleted subexpressions will encapsulate a local subexpression. And therefore, local subexpressions may contain remote subexpressions. Only neutral nodes are subexpressions that are uniform in terms of category.

Therefore, it may be necessary to complete a given request with several rounds in a remote query processor. An alternative approach would be to provide bi-directional communication between query processors, so that the “remote” processor can identify the sub-expression of “local” (actually “remote” in terms of point of view) and call the “local” processor to execute it.

Stage 3: Making a remote request

In the third phase, the list of remote subexpressions will be sent to the remote query processor for execution. (See also the discussion in the previous step.)

The question also arises of how to determine the subexpressions that can be used to effectively limit the resulting dataset returned by the remote query processor. To do this, consider the semantics of the top-level operators in the expression tree (usually && and || ). Short circuit rating && and || complicates the situation, because the request preprocessor cannot change the order of the operands.

Step 4: Performing a Local Query

When all deleted subexpressions are executed, their occurrences in the original expression tree are replaced by the collected results.

Notes

Ultimately, you may need to limit only certain operations to the "remote" subtrees in order to reduce processing complexity - this will be a compromise between the capabilities and the time taken to implement the preliminary request processor.
In order to handle data anti-aliasing (for example, in the example PropX = entityX … i.PropX.SomeProperty == "Hello" in the example below), you will have to perform data flow analysis. Here you will most likely find yourself bundled with many cases that will be complex so that they can be handled.

Expression Tree Dependency Analyzer

Stage 1: Expression Analysis and Labeling

Step 2: Identify Deleted Subexpressions

Stage 3: Making a remote request

Step 4: Performing a Local Query

Notes

More articles: