Design suggestions for real-time data aggregation?

Question

Design suggestions for real-time data aggregation?

I want to create some data aggregation elements in C #, and I would like something similar to a real-time pivot table or some kind of constant updating of the SQL query with support for select , sum , average , first , where and group-by ( where first is in the LINQ sense of "give me the first value").

For example, I might have some kind of table object called Trans with the Name , Date and Total columns, and another table called Price with the Name and Price columns, I want to create a Query instance that executes (in pseudo-SQL)

 select Name, sum(Total), first(Price) from Trans, Price join on Name group by Name

and pass this to the Aggregator instance, which has links to data sources. Along with this, I want to register a callback that hits whenever the line that makes the request makes a change. Therefore, if the price for an object named "XYZ" changes, the callback will be launched with the object containing the new values for this aggregated row. I would also like Aggregator be as efficient as possible, so it will have some kind of indexing scheme, so when changing values, you would not need to scan the table.

I’m not quite sure what to call it, and I hope that I can implement something completely in C #, assuming that this is not an order of magnitude more complicated than I think. I read about Continuous LINQ and Bindable LINQ, but I could not figure out if it was suitable for this problem, or if there would be performance issues (for example, LINQ aggregations listing the entire table when the value changes).

Does anyone know of a project that is doing something like this that I can look at, or suggestions on how to create / build it myself?

edit: I have to notice that the data would not actually be in the database, it would be in memory.

+4

c # aggregation real-time

toasteroven Jul 14 '10 at 15:55

source share

3 answers

Adam houldsworth · Answer 1 · 2010-07-14T16:01:42+0000

The first alternative solution is to combine using basic data changes, i.e. when I update the totals record, go and update the total amount. To do this this way, you will need the old value, but it will also add overhead for any changes you make to the aggregated values. But if the goal of the goal of existing data needs to be aggregated, this may be a viable option.

I do this using my bank balancing application, whenever I insert / modify / delete a transaction, the logic also updates the account balance, because the balance is executed many times and can soon become expensive to calculate when there are many transactions.

I think that structurally problems can also arise if the amounts are stored in the database - for example, problems with locking. I always kept these values in memory.

Update: another possible solution is to pass your data access code through a service level that stores aggregated values in memory - this will take off quickly and almost 0 overhead when inserting / updating / deleting the main data. You can also get smart and make this level transactional, so if the data access action fails, you can undo the change in aggregation.

The only drawback is that database changes must go through this level to avoid canceling aggregation, and to start it you will need to initialize from the database the first time you start or reboot.