By matching the object containing your broadcast variable in your map lambda, Spark will try to serialize the entire object and send it to workers. Since the object contains a reference to SparkContext, you get an error. Instead of this:
pairs = distinct_users_projected.map(lambda x: (x.user_id, pt.broadcast_products_lookup_map.value[x.Prod_ID]))
Try the following:
bcast = pt.broadcast_products_lookup_map pairs = distinct_users_projected.map(lambda x: (x.user_id, bcast.value[x.Prod_ID]))
The latter avoids object reference ( pt ), so Spark should only send a broadcast variable.
source share