I have a cassandra table with a text type field called snapshot containing JSON objects:
[identifier, timestamp, snapshot]
I realized that in order to make transformations in this field using Spark, I need to convert this field of this RDD to another RDD in order to make transformations in the JSON schema.
It is right? How do I get to this?
Edit: At the moment, I have managed to create RDD from one text field:
val conf = new SparkConf().setAppName("signal-aggregation") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) val snapshots = sc.cassandraTable[(String, String, String)]("listener", "snapshots") val first = snapshots.first() val firstJson = sqlContext.jsonRDD(sc.parallelize(Seq(first._3))) firstJson.printSchema()
Which shows me the JSON schema. Good!
How to start telling Spark that this scheme should be applied to all rows of table snapshots in order to get RDD in this snapshot from each row?
scala cassandra apache-spark rdd
galex
source share