I am new to sparks and cassandra. I am trying to insert a cassandra into the table using the spark-cassandra connector as shown below:
import java.util.UUID import org.apache.spark.{SparkContext, SparkConf} import org.joda.time.DateTime import com.datastax.spark.connector._ case class TestEntity(id:UUID, category:String, name:String,value:Double, createDate:DateTime, tag:Long) object SparkConnectorContext { val conf = new SparkConf(true).setMaster("local") .set("spark.cassandra.connection.host", "192.168.xxx.xxx") val sc = new SparkContext(conf) } object TestRepo { def insertList(list: List[TestEntity]) = { SparkConnectorContext.sc.parallelize(list).saveToCassandra("testKeySpace", "testColumnFamily") } } object TestApp extends App { val start = System.currentTimeMillis() TestRepo.insertList(Utility.generateRandomData()) val end = System.currentTimeMillis() val timeDiff = end-start println("Difference (in millis)= "+timeDiff) }
When I insert this method (a list of 100 objects), it takes 300-1100 milliseconds . I tried to insert the same data using phantom . It takes less than 20-40 milliseconds .
Can someone tell me why the spark plug takes a long time to insert? Am I doing something wrong in my code or is it not recommended to use the spark-cassandra connector for insert operations?
source share