Use flatMap (), which is good for accepting individual inputs and creating multiple displayed outputs. Complete with code:
callData = sc.parallelize([["User1", "User2", 2], ["User1", "User3", 4], ["User2", "User1", 8]]) calls = callData.flatMap(lambda record: [(record[0], record[2]), (record[1], record[2])]) print calls.collect() # prints [('User1', 2), ('User2', 2), ('User1', 4), ('User3', 4), ('User2', 8), ('User1', 8)] reduce = calls.reduceByKey(lambda a, b: a + b) print reduce.collect() # prints [('User2', 10), ('User3', 4), ('User1', 14)]
SoldierOfFortran
source share