Spark saveAsTextFile () causes Mkdirs to fail to create for half the directory

Question

Spark saveAsTextFile () causes Mkdirs to fail to create for half the directory

I am currently running a Java Spark application on tomcat and get the following exception:

Caused by: java.io.IOException: Mkdirs failed to create file:/opt/folder/tmp/file.json/_temporary/0/_temporary/attempt_201603031703_0001_m_000000_5

on the line

text.saveAsTextFile("/opt/folder/tmp/file.json") //where text is a JavaRDD<String>

The problem is that / opt / folder / tmp / already exists and successfully creates before /opt/folder/tmp/file.json/_temporary/0/, and then it comes across what looks like a permission problem with the rest of the path _temporary/attempt_201603031703_0001_m_000000_5 , but I gave the permissions of the tomcat user ( chown -R tomcat:tomcat tmp/ and chmod -R 755 tmp/ ) to the tmp / directory. Does anyone know what could happen?

thanks

Edit for @javadba:

 [ root@ip tmp]# ls -lrta total 12 drwxr-xr-x 4 tomcat tomcat 4096 Mar 3 16:44 .. drwxr-xr-x 3 tomcat tomcat 4096 Mar 7 20:01 file.json drwxrwxrwx 3 tomcat tomcat 4096 Mar 7 20:01 . [ root@ip tmp]# cd file.json/ [ root@ip file.json]# ls -lrta total 12 drwxr-xr-x 3 tomcat tomcat 4096 Mar 7 20:01 _temporary drwxrwxrwx 3 tomcat tomcat 4096 Mar 7 20:01 .. drwxr-xr-x 3 tomcat tomcat 4096 Mar 7 20:01 . [ root@ip file.json]# cd _temporary/ [ root@ip _temporary]# ls -lrta total 12 drwxr-xr-x 2 tomcat tomcat 4096 Mar 7 20:01 0 drwxr-xr-x 3 tomcat tomcat 4096 Mar 7 20:01 .. drwxr-xr-x 3 tomcat tomcat 4096 Mar 7 20:01 . [ root@ip _temporary]# cd 0/ [ root@ip 0]# ls -lrta total 8 drwxr-xr-x 3 tomcat tomcat 4096 Mar 7 20:01 .. drwxr-xr-x 2 tomcat tomcat 4096 Mar 7 20:01 .

Exception in catalina.out

 Caused by: java.io.IOException: Mkdirs failed to create file:/opt/folder/tmp/file.json/_temporary/0/_temporary/attempt_201603072001_0001_m_000000_5 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more

+7

java tomcat apache-spark spark-dataframe

DeeVu Mar 03 '16 at 17:13

source share

4 answers

Piotr Kołaczkowski · Answer 1 · 2016-03-14T12:21:33+0000

saveAsTextFile really handled by Spark executors. Depending on your Spark setup, Spark artists may work as a user other than your Spark driver. I think that the spark application driver prepares the directory for work perfectly, but then artists working as another user are not allowed to write in this directory.

Switching to 777 will not help, because permissions are not inherited by child dirs, so you will get 755 anyway.

Try to launch the Spark application as the same user who launches your Spark.

javadba · Answer 2 · 2016-03-03T18:09:38+0000

I suggest temporarily switching to 777 . See if it works at that moment. There were errors / problems with access rights to the local file system. If this still does not work, let us know if something changes or the exact same result.

dagbj · Answer 3 · 2016-03-10T07:24:47+0000

Could this be selinux/apparmor , what do you trick? Check with ls -Z and ls -Z .

Pim witlox · Answer 4 · 2017-12-01T10:05:28+0000

So, I have the same problem, there are no HDFS with my setup, and Spark works offline. I could not save spark information frames to the NFS share using Spark's own methods. The process runs as a local user, and I'm trying to write users to the home folder. Even when creating a subfolder with 777, I cannot write to the folder.

The workaround for this is to convert the data frame with toPandas() and after that to_csv() . It works magically.

Spark saveAsTextFile () causes Mkdirs to fail to create for half the directory

More articles: