Saving does not work in Spark

I am trying to use the persist function in Spark to store data in memory and perform calculations on it. I believe that storing data in memory will speed up calculations for iterative algorithms such as clustering K commands in MLlib.

val data3 = sc.textFile("hdfs:.../inputData.txt") val parsedData3 = data3.map( _.split('\t').map(_.toDouble)) parsedData3.persist(MEMORY_ONLY) 

The following error appears in the persist call:

  scala> parsedData3.persist(MEMORY_ONLY) <console>:17: error: not found: value MEMORY_ONLY parsedData3.persist(MEMORY_ONLY) 

Can someone help me on how to properly use persist to store data in memory for use in an iterative algorithm?

+7
apache-spark
source share
1 answer

If you look at the signature rdd.persist : def persist(newLevel: StorageLevel): this.type , you will see that it requires a value of type StorageLevel, so the correct way to call persist in your example is:

 parsedData3.persist(StorageLevel.MEMORY_ONLY) 

The companion StorageLevel object defines these constants, so including it in a context allows you to use the constant directly (as in your code)

 import org.apache.spark.storage.StorageLevel._ ... parsedData3.persist(MEMORY_ONLY) // this also works 
+14
source share

All Articles