Adjustable, versioned graph database

I am currently working on a project in which I use natural language processing to extract emotions from text in order to correlate them with contextual information.

Definition of contextual information: Each information that is relevant to the description of the essence of the situation in space time.

Description of the data structure I'm looking for:

There is an arbitrary number of entities (an entity can be either a person or a group (etc. with hash tags)), of which I want to track contextual information and their conversations with other objects. Conversations between entities are processed to classify their emotional characteristics. The main emotional features consist of a vector that indicates their percentage: {fear: 0.1, happiness: 0.4, joy: 0.1, surprise: 0.9, anger: 0} Subjects can also provide any contextual information that they would like to share, for example : location, room temperature, blood pressure, etc. (We will call this as contextual variables). Since neither the number of conversations of an entity, nor the number of context variables that they want to share, are clear at any given time, the data structure should be able to adjust accordingly.

It is important . Each change in the data should also represent its own state, as I look forward to the correlation of certain state changes with each other.

An example . Bob and Alice have a conversation that shows a high degree of fear. After a couple of hours, they have another conversation in which there is no longer fear, but happiness. It can now be argued that fear of great magnitude, accompanied by happiness, can be construed as a relief of emotions.

However, in order to be able to extract this very information, I need to be able to correlate different states with each other. The same goes for using contextual information to correlate them with tracked emotions in conversations. That is why every state change must be recorded and made available.

To make this clearer for you, I created a graphic and attached it to the question.

enter image description here Now, the actual question I have is: What database / data structure can I use to solve this problem? I looked through the event databases, but was not sure if I could easily recreate the graph structure with them. I also looked at graphical databases, but did not find what I was looking for.

Therefore, it would be nice if someone here could at least point me in the right direction or help me adjust my structure to solve the problem. If, however, there are data structures supporting what I call it graphical databases with snapshots, then usability is probably the most important function for filtering.

+5
source share
3 answers

There's a Datomic database from Rich Hickey (from the fame of Clojure) that stores facts over time. Each entry in the database is a timestamp fact added only in Event Sourcing.

These facts can be requested with the relational / logical language ala Datalog (going back to Prolog). Please see this post from kisai for an overview. It has been used to query graphs with some success in the past: Using Datomic as a graph database .

While I have no experience with Datomic, it seems to be quite suitable for your specific problem.

+5
source

You have an interesting project, I do not work on such things directly, but for my 2 cents -

It seems to me that your photo is a bit corrupted. You are trying to present an overtime graph database, but there really is no way to present time in this way. If we look at the image, you will have conversations and context changes over time, but the fact of “Bob” and “Alice” and “Malori” does not actually change over time. So let's remove them from the equation.

Instead, focus on things that you can model over time, in conversation, context, location. These things will change as new data becomes available. These objects are a great candidate for an event-based model. In your application, the conversation will be modeled as a series of separate events that your unit will use, combine and take into account to generate the final state, which would be your "relief".

For example, you could write logic, where if the conversation got angry, then a very happy event happened, now the subject feels relief.

What I would do is simulate these conversation states in your db graph connected to your Fact objects, Bob, Alice, etc. And such a request as "What is felt in feelings now?" there will be a graph traversal through your conversation states into factoring in the context of alice-related data.

To answer a question like "What was Alice's feeling 5 minutes ago?" you must take all the threads of events for conversations and rewind them to the appropriate point, and then check the status of the chains.

TL; DR: Separate time-dependent variables from time-independent variables and use the event source to simulate time.

+1
source

There is an obvious 1: 1 correspondence between your states at a given time and a relational database with this schema. Thus, there is an obvious 1: 1 correspondence between your set of states over time and the database with a changing schema, that is, a variable whose value is a database plus metadata controlled by the DDL and DML update commands. Thus, there is no evidence that you should not just use a relational DBMS.

Relational DBMSs allow you to perform typical queries with automated implementation with a certain computational complexity with certain opportunities for optimization. Any application can have specialized queries that make a specialized data and operator structure the best choice. But you must develop your application and be aware of such special aspects in order to justify this. Be that as it may, with the obvious correspondences between your states and relational states, this is not justified.

EAV is often used instead of DDL and a changing scheme. But in EAV, the DBMS does not know the actual tables that you belong to, which have columns that are EAV attributes, and which are explicit in the approach to changing the DDL / DML schema. Thus, EAV predetermines simplicity, clarity, optimization and, above all, integrity and ACID. This can be justified (compared to DDL / DML, assuming that the relational view is otherwise suitable), demonstrating that DDL with schema updates (adding, deleting and modifying columns and tables) is worse (higher) than EAV in your specific application.

Just because you can draw an image of your application state at some point using a graph does not mean you need a graph table . What matters is which specialized queries / expressions you will evaluate. You should understand that this concerns your problem area, which is probably most easily expressed in relation to some specialized data and operator structure and relationally. Then you can compare the expressive and computational requirements for a specialized data structure, relational representation, and specific graphic database models . Be sure to do google stackoverflow .

According to Wikipedia, " Neo4j is the most popular graph database used today."

0
source

Source: https://habr.com/ru/post/1213714/


All Articles