I am trying to use the persist function in Spark to store data in memory and perform calculations on it. I believe that storing data in memory will speed up calculations for iterative algorithms such as clustering K commands in MLlib.
val data3 = sc.textFile("hdfs:.../inputData.txt") val parsedData3 = data3.map( _.split('\t').map(_.toDouble)) parsedData3.persist(MEMORY_ONLY)
The following error appears in the persist call:
scala> parsedData3.persist(MEMORY_ONLY) <console>:17: error: not found: value MEMORY_ONLY parsedData3.persist(MEMORY_ONLY)
Can someone help me on how to properly use persist to store data in memory for use in an iterative algorithm?
apache-spark
Ravi
source share