Connect to Kassandra with Spark

First, I bought a new O'Reilly Spark book and tried Cassandra setup instructions. I also found other stackoverflow posts and various posts and guides on the net. None of them work as they are. Below as far as I could get.

This is a test containing only a few records of dummy test data. I am running the latest Cassandra 2.0.7 Virtual Box VM, provided by plasetcassandra.org, related to the Cassandra project home page.

I downloaded the source Spark 1.2.1 and got the latest Cassandra Connector code from github and built as against Scala 2.11. I have JDK 1.8.0_40 and Scala 2.11.6 installed on Mac OS 10.10.2.

I run the spark shell with the cassandra connector loaded:

bin/spark-shell --driver-class-path ../spark-cassandra-connector/spark-cassandra-connector/target/scala-2.11/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar

Then I do what should be a simple type of checking the number of rows in a test table of four entries:

import com.datastax.spark.connector._
sc.stop
val conf = new org.apache.spark.SparkConf(true).set("spark.cassandra.connection.host", "192.168.56.101")
val sc = new org.apache.spark.SparkContext(conf)
val table = sc.cassandraTable("mykeyspace", "playlists")
table.count

I get the following error. What confuses him is that he gets errors trying to find Cassandra at 127.0.0.1, but also recognizes the host name that I configured, which is 192.168.56.101.

15/03/16 15:56:54 INFO Cluster: New Cassandra host /192.168.56.101:9042 added
15/03/16 15:56:54 INFO CassandraConnector: Connected to Cassandra cluster: Cluster on a Stick
15/03/16 15:56:54 ERROR ServerSideTokenRangeSplitter: Failure while fetching splits from Cassandra
java.io.IOException: Failed to open thrift connection to Cassandra at 127.0.0.1:9160
<snip>
java.io.IOException: Failed to fetch splits of TokenRange(0,0,Set(CassandraNode(/127.0.0.1,/127.0.0.1)),None) from all endpoints: CassandraNode(/127.0.0.1,/127.0.0.1)

By the way, I can also use the configuration file in conf / spark-defaults.conf to do this without having to close / recreate the spark context or pass the -driver-clas-path argument. I ended up in the same error, and the above steps seem easier to communicate in this post.

Any ideas?

+4
source share
1 answer

rpc_address cassandra.yaml cassandra node. , system.local/system.peers 127.0.0.1 cassandra.yaml.

, cassandra. , , , C * 2.1.4 system.size_estimates (CASSANDRA-7688). , , , 9160.

+4

All Articles