In a C # -4.0 application, I have a dictionary of strongly typed ILists with the same length - a dynamically strongly typed column-based table. I want the user to provide one or more expressions (python-) based on the available columns that will be aggregated across all rows. In a static context, it will be:
IDictionary<string, IList> table; // ... IList<int> a = table["a"] as IList<int>; IList<int> b = table["b"] as IList<int>; double sum = 0; for (int i = 0; i < n; i++) sum += (double)a[i] / b[i]; // Expression to sum up
With n = 10 ^ 7, this works for 0.270 seconds on my laptop (win7 x64). Replacing the expression with a delegate for two int arguments, it takes 0.580 seconds, for an untyped delegate 1.19 seconds. Creating a delegate from IronPython using
IDictionary<string, IList> table; // ... var options = new Dictionary<string, object>(); options["DivisionOptions"] = PythonDivisionOptions.New; var engine = Python.CreateEngine(options); string expr = "a / b"; Func<int, int, double> f = engine.Execute("lambda a, b : " + expr); IList<int> a = table["a"] as IList<int>; IList<int> b = table["b"] as IList<int>; double sum = 0; for (int i = 0; i < n; i++) sum += f(a[i], b[i]);
3.2 s are required (and 5.1 s with Func<object, object, object> ) - coefficient 4 - 5.5. Is this the expected overhead of what I'm doing? What can be improved?
If I have many columns, the approach chosen above will no longer be sufficient. One solution might be to define the necessary columns for each expression and use only those that are arguments. Another solution that I tried unsuccessfully was to use ScriptScope and dynamic column resolution. For this, I defined a RowIterator that has a RowIndex for the active row and a property for each column.
class RowIterator { IList<int> la; IList<int> lb; public RowIterator(IList<int> a, IList<int> b) { this.la = a; this.lb = b; } public int RowIndex { get; set; } public int a { get { return la[RowIndex]; } } public int b { get { return lb[RowIndex]; } } }
ScriptScope script can be created from IDynamicMetaObjectProvider, which, as I expected, will be implemented using C # dynamic, but in runtime engine.CreateScope (IDictionary) tries to call, which fails.
dynamic iterator = new RowIterator(a, b) as dynamic; var scope = engine.CreateScope(iterator); var expr = engine.CreateScriptSourceFromString("a / b").Compile(); double sum = 0; for (int i = 0; i < n; i++) { iterator.Index = i; sum += expr.Execute<double>(scope); }
Next, I tried to inherit the RowIterator from DynamicObject and switched to a running example - with terrible performance: 158 seconds.
class DynamicRowIterator : DynamicObject { Dictionary<string, object> members = new Dictionary<string, object>(); IList<int> la; IList<int> lb; public DynamicRowIterator(IList<int> a, IList<int> b) { this.la = a; this.lb = b; } public int RowIndex { get; set; } public int a { get { return la[RowIndex]; } } public int b { get { return lb[RowIndex]; } } public override bool TryGetMember(GetMemberBinder binder, out object result) { if (binder.Name == "a")
I was surprised that TryGetMember is called with the name of the properties. From the documentation, I would expect TryGetMember to be called only for undefined properties.
Perhaps for reasonable performance I will need to implement IDynamicMetaObjectProvider for my RowIterator in order to use dynamic CallSites, but could not find a suitable example for me. In my experiments, I did not know how to handle __builtins__ in BindGetMember:
class Iterator : IDynamicMetaObjectProvider { IList<int> la; IList<int> lb; public Iterator(IList<int> a, IList<int> b) { this.la = a; this.lb = b; } public int RowIndex { get; set; } public int a { get { return la[RowIndex]; } } public int b { get { return lb[RowIndex]; } } public DynamicMetaObject GetMetaObject(Expression parameter) { return new MetaObject(parameter, this); } private class MetaObject : DynamicMetaObject { internal MetaObject(Expression parameter, Iterator self) : base(parameter, BindingRestrictions.Empty, self) { } public override DynamicMetaObject BindGetMember(GetMemberBinder binder) { switch (binder.Name) { case "a": case "b": Type type = typeof(Iterator); string methodName = binder.Name; Expression[] parameters = new Expression[] { Expression.Constant(binder.Name) }; return new DynamicMetaObject( Expression.Call( Expression.Convert(Expression, LimitType), type.GetMethod(methodName), parameters), BindingRestrictions.GetTypeRestriction(Expression, LimitType)); default: return base.BindGetMember(binder); } } } }
I am sure my code above is sub-optimal, at least it does not yet handle IDictionary columns. I would appreciate any advice on how to improve design and / or performance.