This is an interesting problem. I would consider a multi-step approach:
- analysis of the expression (probably from the bottom up) of the expression tree and marking the nodes as "remote", "local" and "neutral"
- top remote sub-expression identification
- remote query execution (subexpression exception)
- Local query execution
The following is more detailed information for each phase. The “Notes” section at the end of my answer contains some important points to keep in mind.
Disclaimer: My answer is rather sketchy, and I am sure that it does not cover many aspects and cases that may occur regarding the semantics of the individual operations allowed in the expression tree. I think some compromises should be made to make the implementation simple enough.
Stage 1: Expression Analysis and Labeling
Each node in the expression tree can be considered to belong to the following categories:
- "remote" nodes correspond to operations that must be performed remotely;
- "local" nodes correspond to operations that must be performed locally;
- "neutral" nodes correspond to operations that can be performed by any query processor.
The bottom-up approach for processing and processing the expression tree seems appropriate for this case. The reason is that when processing a given node X that has subnodes from Y_1 to Y_n, the category of node X largely depends on the categories of its trays Y_1 to T_n.
Rewrite the sample you provided:
entityX.SomeProperty == "Hello" && entityY.SomeOtherProperty == "Hello 2" && entityX.Id == entityY.Id
to the outline of the corresponding expression tree:
&&(&&(==(Member(SomeProperty, Var(entityX)), "Hello"), ==(Member(SomeOtherProperty, Var(entityY)), "Hello 2")), ==(Member(Id, Var(entityX)), Member(Id, Var(entityY)))
This expression tree will be marked from bottom to top. R for "remote", L for "local", N for "neutral". If entityX removed and entityY is local, the result will look like this:
L:&&(L:&&(R:==(R:Member(SomeProperty, R:Var(entityX)), N:"Hello"), L:==(L:Member(SomeOtherProperty, L:Var(entityY)), N:"Hello 2")), L:==(R:Member(Id, R:Var(entityX)), L:Member(Id, L:Var(entityY)))
As you can see, for each operator, your analyzer will have to determine a category based on the categories of its subnode. In the above example:
- accessing the object of the object will lead to the same category as the object;
- the string literal will be neutral;
- comparing the equality of local and remote subexpressions will have a local category;
- the operator will again support local remoteness.
However, you might consider combining a bottom-up approach with top-down optimization to get better results. Consider this (symbolic): R == R + L How do you want to perform an equality comparison? With a clean upstream approach, you execute it locally. However, in some situations, it would be better to pre-compute L locally, replace the subexpression with the actual value (i.e., Neutral) and remotely perform an equality comparison. In other words, you can complete the implementation of the query plan optimizer.
Step 2: Identify Deleted Subexpressions
In the next phase, the tagged expression tree will be processed from top to bottom and each subexpression marked as deleted, deduced from the tree and enrolled in the set of expressions remotely processed for each element in the remote dataset.
From the foregoing, it is clear that some deleted subexpressions will encapsulate a local subexpression. And therefore, local subexpressions may contain remote subexpressions. Only neutral nodes are subexpressions that are uniform in terms of category.
Therefore, it may be necessary to complete a given request with several rounds in a remote query processor. An alternative approach would be to provide bi-directional communication between query processors, so that the “remote” processor can identify the sub-expression of “local” (actually “remote” in terms of point of view) and call the “local” processor to execute it.
Stage 3: Making a remote request
In the third phase, the list of remote subexpressions will be sent to the remote query processor for execution. (See also the discussion in the previous step.)
The question also arises of how to determine the subexpressions that can be used to effectively limit the resulting dataset returned by the remote query processor. To do this, consider the semantics of the top-level operators in the expression tree (usually && and || ). Short circuit rating && and || complicates the situation, because the request preprocessor cannot change the order of the operands.
Step 4: Performing a Local Query
When all deleted subexpressions are executed, their occurrences in the original expression tree are replaced by the collected results.
Notes
Ultimately, you may need to limit only certain operations to the "remote" subtrees in order to reduce processing complexity - this will be a compromise between the capabilities and the time taken to implement the preliminary request processor.
In order to handle data anti-aliasing (for example, in the example PropX = entityX … i.PropX.SomeProperty == "Hello" in the example below), you will have to perform data flow analysis. Here you will most likely find yourself bundled with many cases that will be complex so that they can be handled.