I have an RDD that has the following structure:
((user_id,item_id,rating))
lets call it RDD as training
Then there is another rdd with the same structure:
((user_id,item_id,rating))
and this rdd as a test
I want to make sure that the data in the test is not displayed on the train for each user. So let's say
train = {u1,item2: u1,item4 : u1,item3} test={u1,item2:u1, item5}
I want to make sure item2 is removed from u1 training data.
so i started doing groupBy as rdd (s) (user_id, item_id)
val groupedTrainData = trainData.groupBy(x => (x._1, x._2))
But I feel that this is not the way.
source share