How to implement the continuity of objects not related to loading into memory?

I have a Graph object (this is in Perl) for which I compute its transit closure (i.e. to solve the all-pairs shortest paths problem).

From this object, I am interested in calculating:

  • The shortest path from any vertices u β†’ v.
  • The distance matrix for all vertices.
  • General accessibility issues.
  • General graph functions (density, etc.).

The graph has about 2,000 vertices, so it takes a couple of hours to calculate the transitive closure (using the Floyd-Warshall algorithm ). I am currently simply caching a serialized object (using Storable , so it is pretty efficient already).

My problem is that deserializing this object still takes a lot of time (a minute or so) and consumes about 4 GB of RAM. This is not acceptable for my application.

Therefore, I was thinking about how to create a database schema for storing this object in the "expanded" form. In other words, pre-copy the shortest paths of all pairs and save them accordingly. Then perhaps use stored procedures to retrieve the necessary information.

My other problem is that I have no experience in developing databases, and I have no idea about the implementation of the above, hence my post. I would also like to hear about other solutions that I can ignore. Thanks!

+4
source share
1 answer

For starters, it seems that you need two objects: a vertex and an edge, and perhaps a couple of tables for the results. I would suggest a table that stores node-to-node information. If A is reachable from Y, then the relation gets the available attribute. So here

Vertex: any coordinates (x,y,...) name: string any attributes of a vertex* Association: association_id: ID association_type: string VertexInAssociation: vertex: (constrained to Vertex) association: (constrained to association) AssociationAttributes: association_id: ID (constrained to association) attribute_name: string attribute_value: variable -- possibly string 

* You can also save vertex attributes in a table, depending on how complex they are.

The reason I add Association complexity is because the edge is not considered directional and makes it easier to query both vertices just to be members of the connected-by-edge-x vertex set,

Thus, an edge is simply an edge type association that has a distance attribute. A path is an association of a path type, and it can have a hop attribute.

There may be other more optimized schemes, but this one is conceptually clean - even if it does not make the first-class concept of "edge" the essence of the first class.

To create a minimal edge, you will need to do this:

 begin transaction select associd = max(association_id) + 1 from Association insert into Association ( association_id, association_type ) values( associd, 'edge' ) insert into VertexInAssociation( association_id, vertex_id ) select associd, ? -- $vertex->[0]->{id} UNION select associd, ? -- $vertex->[1]->{id} insert into AssociationAttributes ( association_id, association_name, association_value ) select associd, 'length', 1 UNION select associd, 'distance', ? -- $edge->{distance} commit 

You may also want to create sort class classes. So the association "edge" is automatically counted as an "achievable" association. Otherwise, you can insert UNION select associd, reachable, 'true' in the same place.

  • And then you can request the union of the achievable associations of both vertices and discard them as achievable associations with another node, if they did not exist, and assign the value of the length attribute + 1 to the length attribute.

However, you probably need an ORM for all this, and just manipulate it inside Perl.

 my $v1 = Vertex->new( 'V', x => 23, y => 89, red => 'hike!' ); my $e = Edge->new( $v1, $v2 ); # perhaps Edge knows how to calculate distance. 
+2
source

Source: https://habr.com/ru/post/1316703/


All Articles