I am trying to use the following code, which adds a number to each row in RDD and returns an RDD list using PySpark.
from pyspark.context import SparkContext file = "file:///home/sree/code/scrap/sample.txt" sc = SparkContext('local', 'TestApp') data = sc.textFile(file) splits = [data.map(lambda p : int(p) + i) for i in range(4)] print splits[0].collect() print splits[1].collect() print splits[2].collect()
The contents of the input file (sample.txt):
1 2 3
I expected this output (adding numbers to rdd with 0, 1, 2, respectively):
[1,2,3] [2,3,4] [3,4,5]
whereas the actual result was:
[4, 5, 6] [4, 5, 6] [4, 5, 6]
which means that understanding only used the value 3 for variable i, regardless of range (4).
Why is this behavior happening?
srjit source share