Geographical data of Titan on Kassandra

Question

Geographical data of Titan on Kassandra

I am considering using Titan to create a scalable storage of geospatial data (I think R trees). The documentation has a GeoShape request, and the documents say that titanium can perform geodata using Lucene or ElasticSearch. However, this seems to be very slow, because moving nodes in cassandra essentially makes joining requests in cassandra, which is a really bad idea. I think I might misunderstand the presentation of the data.

I read the Titan Data Model doc doc and I still don't quite understand it. If all edges are stored in a Cassandra row, Titan should still “join” the vertex table. One way to solve this problem would be to make the column value equal to the edge property data, and then you can neatly pack the vertex data and the border data into a string. However, this is interrupted when you want to make queries deeper than 1 node, and again we return to the connection problem.

So. Is a titanium emulating accession request in Kassandra? - and - How effective is it when searching under geological conditions in these conditions?

+6

graph cassandra titan

Peter Klipfel Mar 15 '14 at 3:14

source share

1 answer

Dan LaRocque · Accepted Answer · 2014-03-25T01:41:46+0000

I think this question connects a workaround with geospatial index searches. They are separate both at the API level and at the implementation level. The index is not shown in the data model snapshots.

Let me make it a little more specific. Let's say I run Titan with ES and Cassandra using Murmur3Partitioner or RandomPartitioner. I am announcing the ES geospatial index around the edges, called "place", as described on the Getting Started page . Looking out for geospatial queries such as "WITHIN" in Getting Started documents , he first turns to ES. ES returns identifiers that Titan can use to quickly search for related vertex / edge data in Cassandra, without making any analogues to relational joins.

The cost of these edge searches on geospatial data should be approximately equivalent to the cost of implementing ES WITHIN (which, it seems to me, is delegated by Spatial4j), plus the queries that Titan makes on Cassandra after obtaining identifiers that should be roughly linear in the number of edges found by ES. This is just an estimate of the envelope, so please take it with plenty of salt.

After I get the edges according to the geocommand, if then I want to start arbitrary traversals in the vicinity of each edge in the set, then I would look at rooting MultiQuery at the tops of the head / tail and activating caching at the database level. If the request misses the cache or the cache is cold / disabled, Titan will still try to extract all edges that bypass in the same Cassandra fragment to the top, whenever possible. If you are concerned about the efficiency of traversing the edge of Titan, you can find Boutique Graph Data with Titan .

NTN

Geographical data of Titan on Kassandra

More articles: