Being new to apache sparks, facing some problem while getting Cassandra data about Spark.
List<String> dates = Arrays.asList("2015-01-21","2015-01-22"); CassandraJavaRDD<A> aRDD = CassandraJavaUtil.javaFunctions(sc). cassandraTable("testing", "cf_text",CassandraJavaUtil.mapRowTo(A.class, colMap)). where("Id=? and date IN ?","Open",dates);
This request does not filter data on the cassandra server. Although this java expression does its memory spin and finally throws a spark java.lang.OutOfMemoryError exception. The request should be filtered on the cassandra server instead of the client side, as indicated at https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md .
While I am executing a query with filters in cassandra cqlsh, it executes fine, but it executes a query without a filter (where clause) giving the wait time that is expected. Therefore, it is clear that the spark does not apply filters on the client side.
SparkConf conf = new SparkConf(); conf.setAppName("Test"); conf.setMaster("local[8]"); conf.set("spark.cassandra.connection.host", "192.168.1.15")
Why filters are applied on the client side and how they can be improved to apply filters on the server side.
How can we configure a spark cluster on top of a cassandra cluster on a Windows platform?
source share