Filter one key-based RDD in another

I have two RDDs:

**rdd1** id1 val1 id2 val2 **rdd2** id1 v1 id2 v2 id1 v3 id8 v7 id1 v4 id3 v5 id6 v6 

I want to filter RDD2 so that it contains only the keys to rdd1. Thus, the output will be

 **output** id1 v1 id2 v2 id1 v3 id1 v4 

This has been set in stackoverflow before, but for a smaller dataset where people transferred a lot and then used to filter, but my problem in the amount of rdd1 is> 500 million and rdd2 is more than 10 billion

Help Pls

+4
source share
1 answer

Use join:

 val res: RDD[(Long, V)] = rdd1.join(rdd2) .map { case(k, (_, v2)) => (k, v2) } 
+6
source

All Articles