Saving does not work in Spark

Question

Saving does not work in Spark

I am trying to use the persist function in Spark to store data in memory and perform calculations on it. I believe that storing data in memory will speed up calculations for iterative algorithms such as clustering K commands in MLlib.

val data3 = sc.textFile("hdfs:.../inputData.txt") val parsedData3 = data3.map( _.split('\t').map(_.toDouble)) parsedData3.persist(MEMORY_ONLY)

The following error appears in the persist call:

  scala> parsedData3.persist(MEMORY_ONLY) <console>:17: error: not found: value MEMORY_ONLY parsedData3.persist(MEMORY_ONLY)

Can someone help me on how to properly use persist to store data in memory for use in an iterative algorithm?

+7

apache-spark

Ravi Jul 17 '14 at 7:28

source share

1 answer

maasg · Accepted Answer · 2014-07-17T08:14:43+0000

If you look at the signature rdd.persist : def persist(newLevel: StorageLevel): this.type , you will see that it requires a value of type StorageLevel, so the correct way to call persist in your example is:

 parsedData3.persist(StorageLevel.MEMORY_ONLY)

The companion StorageLevel object defines these constants, so including it in a context allows you to use the constant directly (as in your code)

 import org.apache.spark.storage.StorageLevel._ ... parsedData3.persist(MEMORY_ONLY) // this also works

Saving does not work in Spark

More articles: