Counting Super Knots on Titan

On my system, I have a requirement that the number of edges on a node must be stored as an internal property at the vertex, as well as the vertex centering index on a specific outgoing edge. This, of course, requires me to count the number of edges per node after all the data has finished loading. I do like this:

long edgeCount = graph.getGraph().traversal().V(vertexId).bothE().count().next(); 

However, when I increase my tests to such an extent that some of my nodes are β€œsuper” nodes, I get the following exception in the line above:

 Caused by: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9160, latency=4792(4792), attempts=1]org.apache.thrift.transport.TTransportException: Frame size (70936735) larger than max length (62914560)! at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:197) ~[astyanax-thrift-3.8.0.jar!/:3.8.0] at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65) ~[astyanax-thrift-3.8.0.jar!/:3.8.0] at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28) ~[astyanax-thrift-3.8.0.jar!/:3.8.0] at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:153) ~[astyanax-thrift-3.8.0.jar!/:3.8.0] at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119) ~[astyanax-core-3.8.0.jar!/:3.8.0] at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352) ~[astyanax-core-3.8.0.jar!/:3.8.0] at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538) ~[astyanax-thrift-3.8.0.jar!/:3.8.0] at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:112) ~[titan-cassandra-1.0.0.jar!/:na] 

What is the best way to fix this? Should I just increase the frame size or is there a better way to count the number of edges per node?

+6
source share
2 answers

Yes, you need to increase the size of the frame. When you have a supernode, there is a really big line that should be read from the storage backend, and this is even true in the case of OLAP. I agree that if you plan to calculate this at every vertex of the graph, this is best done as an OLAP operation.

This and a few other useful tips can be found on this titan mailing list titanium . Keep in mind that the link is quite old, so the concepts are still valid, but some Titan configuration property names may be different.

+3
source

Such a task, which is OLAP in nature, must be performed using a distributed system without using a workaround.

There is a concept of GraphComputer in TinkerPop 3 that can be used to accomplish such a task.

This basically allows you to run Gremlin queries, which will be evaluated on multiple machines.

For example, you can use SparkGraphComputer to run your queries on top of Apache Spark .

+3
source

All Articles