We are launching a Titan graph on top of HBase containing about 30 TB of data, and many features are missing.
For example, must-have is the ability to perform OLAP operations on a chart, such as removing redundant vertices using Spark.
Although it seems that Tinkerpop does just that using SparkGraphComputer, it doesnβt work very well - implementing reading data from HBase using the Hadoop InputFormat is a mistake and many scripts are not processed (for example, a vertex that is connected to itself in a loop makes the code excite exception and terminate). In addition, the performance of partitions that analyze vertices from raw data is simply poor β there are many redundant distributions that are redundant and make everything slow.
If you have been planning a long schedule for a long time, I donβt think Titan is suitable - unless you intend to use your own code.
imriqwe
source share