"The job has not started yet" for Spark Job containing repartition ()

After I scratched my head over “NO tasks have been started” to set pyspark for some time, the problem was isolated as:

Works:

ssc = HiveContext(sc) sqlRdd = ssc.sql(someSql) sqlRdd.collect() 

Adding to repartition (), and it hangs "No jobs have started yet":

 ssc = HiveContext(sc) sqlRdd = ssc.sql(someSql).repartition(2) sqlRdd.collect() 

It is at 1.2.0 complete with CDH5

+5
source share

Source: https://habr.com/ru/post/1214913/


All Articles