What would be the best way to get Spark to work on a distributed database? (RDD or FileInputFormat)

Question

We have some kind of distributed data warehouse. We know all the internal components and can access data directly on disk.

I am exploring the possibility of deploying Apache Spark directly above it.

What would be the best / recommended way to do this?

(easier than another?) Better performance, etc.

thanks for the help

+4

Ariel Oct 28 '14 at 18:23

2 answers

CustomRDD. datastax-cassandra, RDD

, , , Cassandra RDD. , .

+3

Josh Rosen · Accepted Answer · 2014-11-02T18:15:39+0000

Spark 1.2 API Spark SQL. () SQL , . Spark SQL (, RDD API), , , , Spark SchemaRDD.

API: