How to write a query of choice or a server function that will generate a neat schedule of the flow of time from many data points?

NOTE. I am using a graph database (for a specific OrientDB). This gives me the freedom to write a server function in javascript or groovy, rather than limiting myself to SQL for this problem. *

NOTE 2: Since this is a graph database, the arrows below simply describe the data flow. I literally don't need the arrows that will be returned in the request. Arrows represent relationships. *

I have data that is presented in streaming mode; that is, EventC occurs after EventB, which occurs after EventA, etc. These data come from several sources, so they are not completely linear. It needs to be put together, and it was there that I had a problem.

Currently, the data looks something like this:

# | event | next -------------------------- 12:0 | EventA | 12:1 12:1 | EventB | 12:2 12:2 | EventC | 12:3 | EventA | 12:4 12:4 | EventD | 

Where "next" is the edge out () for the event following in the time stream. On the chart, it looks like this:

 EventA-->EventB-->EventC EventA-->EventD 

Since this data needs to be collected together, I need to combine repeating events, but keep their edges . In other words, I need a select query that will result in:

  -->EventB-->EventC EventA--| -->EventD 

In this example, since EventB and EventD occurred after EventA (only at different times), the select query will display two branches with EventA, rather than two separate time streams.


EDIT No. 2

If an additional data set was to be added to the data above, with EventB-> EventE, the resulting data / graph will look like this:

 # | event | next -------------------------- 12:0 | EventA | 12:1 12:1 | EventB | 12:2 12:2 | EventC | 12:3 | EventA | 12:4 12:4 | EventD | 12:5 | EventB | 12:6 12:6 | EventE | EventA-->EventB-->EventC EventA-->EventD EventB-->EventE 

I need a query to create a tree like:

  -->EventC -->EventB--| | -->EventE EventA--| -->EventD 

EDIT No. 3 and No. 4

Here is the data with the edges shown in contrast to the next column above. I also added a few extra columns to hopefully clarify any confusion regarding the data:

 # | event | ip_address | timestamp | in | out | ---------------------------------------------------------------------------- 12:0 | EventA | 123.156.189.18 | 2015-04-17 12:48:01 | | 13:0 | 12:1 | EventB | 123.156.189.18 | 2015-04-17 12:48:32 | 13:0 | 13:1 | 12:2 | EventC | 123.156.189.18 | 2015-04-17 12:48:49 | 13:1 | | 12:3 | EventA | 103.145.187.22 | 2015-04-17 14:03:08 | | 13:2 | 12:4 | EventD | 103.145.187.22 | 2015-04-17 14:05:23 | 13:2 | | 12:5 | EventB | 96.109.199.184 | 2015-04-17 21:53:00 | | 13:3 | 12:6 | EventE | 96.109.199.184 | 2015-04-17 21:53:07 | 13:3 | | 

The data is saved in such a way as to save every single event and session stream (marked with an IP address).

TL; DR

A lot of events have been received, some duplicates and all of them are organized in one neat time flow schedule.

+7
sql graph-databases orientdb
source share
1 answer

Holy cow.

After struggling with this for more than a week, I think I RELIABLE have a working function. This is not optimized for performance (oh loop!), But work is currently being done while I can work on performance. Obtained server-side function OrientDB (written in javascript):

Function:

 // Clear previous runs db.command("truncate class tmp_Then"); db.command("truncate class tmp_Events"); // Get all distinct events var distinctEvents = db.query("select from Events group by event"); // Send 404 if null, otherwise proceed if (distinctEvents == null) { response.send(404, "Events not found", "text/plain", "Error: events not found" ); } else { var edges = []; // Loop through all distinct events distinctEvents.forEach(function(distinctEvent) { var newEvent = []; var rid = distinctEvent.field("@rid"); var eventType = distinctEvent.field("event"); // The main query that finds all *direct* descendents of the distinct event var result = db.query("select from (traverse * from (select from Events where event = ?) where $depth <= 2) where @class = 'Events' and $depth > 1 and @rid in (select from Events group by event)", [eventType]); // Save the distinct event in a temp table to create temp edges db.command("create vertex tmp_Events set rid = ?, event = ?", [rid, event]); edges.push(result); }); // The edges array defines which edges should exist for a given event edges.forEach(function(edge, index) { edge.forEach(function(e) { // Create the temp edge that corresponds to its distinct event db.command("create edge tmp_Then from (select from tmp_Events where rid = " + distinctEvents[index].field("@rid") + ") to (select from tmp_Events where rid = " + e.field("@rid") + ")"); }); }); var result = db.query("select from tmp_Events"); return result; } 

Takeaways:

  • Temp tables have proven necessary. I tried to do this without temporary tables (classes), but I'm not sure if this can be done. I needed to mock edges that were not in the source data.
  • Traverse was very useful when writing the main request. Passing through the event to find its direct, unique descendants was quite simple.
  • Having the ability to write stored procs in Javascript is awesome. That would be a nightmare in SQL.
  • omfg. I plan to optimize this and continue to make it better, so I hope other people can find some use for this.
0
source share

All Articles