NOTE. I am using a graph database (for a specific OrientDB). This gives me the freedom to write a server function in javascript or groovy, rather than limiting myself to SQL for this problem. *
NOTE 2: Since this is a graph database, the arrows below simply describe the data flow. I literally don't need the arrows that will be returned in the request. Arrows represent relationships. *
I have data that is presented in streaming mode; that is, EventC occurs after EventB, which occurs after EventA, etc. These data come from several sources, so they are not completely linear. It needs to be put together, and it was there that I had a problem.
Currently, the data looks something like this:
Where "next" is the edge out () for the event following in the time stream. On the chart, it looks like this:
EventA-->EventB-->EventC EventA-->EventD
Since this data needs to be collected together, I need to combine repeating events, but keep their edges . In other words, I need a select query that will result in:
-->EventB-->EventC EventA--| -->EventD
In this example, since EventB and EventD occurred after EventA (only at different times), the select query will display two branches with EventA, rather than two separate time streams.
EDIT No. 2
If an additional data set was to be added to the data above, with EventB-> EventE, the resulting data / graph will look like this:
I need a query to create a tree like:
-->EventC -->EventB--| | -->EventE EventA--| -->EventD
EDIT No. 3 and No. 4
Here is the data with the edges shown in contrast to the next column above. I also added a few extra columns to hopefully clarify any confusion regarding the data:
# | event | ip_address | timestamp | in | out | ---------------------------------------------------------------------------- 12:0 | EventA | 123.156.189.18 | 2015-04-17 12:48:01 | | 13:0 | 12:1 | EventB | 123.156.189.18 | 2015-04-17 12:48:32 | 13:0 | 13:1 | 12:2 | EventC | 123.156.189.18 | 2015-04-17 12:48:49 | 13:1 | | 12:3 | EventA | 103.145.187.22 | 2015-04-17 14:03:08 | | 13:2 | 12:4 | EventD | 103.145.187.22 | 2015-04-17 14:05:23 | 13:2 | | 12:5 | EventB | 96.109.199.184 | 2015-04-17 21:53:00 | | 13:3 | 12:6 | EventE | 96.109.199.184 | 2015-04-17 21:53:07 | 13:3 | |
The data is saved in such a way as to save every single event and session stream (marked with an IP address).
TL; DR
A lot of events have been received, some duplicates and all of them are organized in one neat time flow schedule.