Java 7's nio.file package is too slow when creating new files

I am trying to create 300M files from a java program, I switched from the old API file to the new java 7 nio package, but the new package goes even slower than the old one.

I see less CPU utilization than using the old API files, but I run this simple code and I get a file transfer speed of 0.5 MB / s, and records from java are read from one disk and write to another (writing is the only one process, disk access).

Files.write(FileSystems.getDefault().getPath(filePath), fiveToTenKBytes, StandardOpenOption.CREATE); 

Is there any hope of getting reasonable bandwidth here?


Update:

I decompress 300 million image files of size 5-10 thousand bytes from large files. I have 3 drives, 1 local and 2 SAN (all have a typical throughput of ~ 20 MB / s on large files).

I also tried this code, which improved the speed to a bandwidth of less than 2 MB / s (9 days to unpack these files).

 ByteBuffer byteBuffer = ByteBuffer.wrap(imageBinary, 0, (BytesWritable)value).getLength()); FileOutputStream fos = new FileOutputStream( imageFile ); fos.getChannel().write(byteBuffer); fos.close(); 

I read from the local disk and write to the attached SAN disk. I read from the Hadoop SequenceFile format, hadoop can usually read these files at a speed of 20 MB / s, using basically the same code.

The only thing that seems inappropriate, apart from the slowness of uber, is that I see more IOs read than the IO record by about 2: 1, although the gziped sequence file (the images get almost 1: 1 ratio though), so the compressed file should be approx. 1: 1 with an exit.


2nd UPDATE

Looking at iostat , I see some odd numbers, we look at xvdf here, I have one java process that is read from xvdb and written to xvdf , and there are no active ohter processes on xvdf

 iostat -d 30 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdap1 1.37 5.60 4.13 168 124 xvdb 14.80 620.00 0.00 18600 0 xvdap3 0.00 0.00 0.00 0 0 xvdf 668.50 2638.40 282.27 79152 8468 xvdg 1052.70 3751.87 2315.47 112556 69464 

Reading on xvdf is 10x records, which are incredible.

 fstab /dev/xvdf /mnt/ebs1 auto defaults,noatime,nodiratime 0 0 /dev/xvdg /mnt/ebs2 auto defaults,noatime,nodiratime 0 0 
+6
source share
2 answers

I think your slowness comes from creating new files, not the actual transfer. I believe that creating a file is a synchronous operation on Linux: the system call will not be returned until the file is created and the directory is updated. This suggests a couple of things you can do:

  • Use multiple message streams with one read stream. The reader thread will read data from the source file in byte[] , and then create a Runnable that writes the output file from this array. Use a threadpool with lots of threads β€” maybe 100 or more β€” because they will spend most of their time waiting for creat to complete. Set the capacity of the incoming pool queue depending on the size of your memory: if your files are 10 KB in size, then the capacity 1000 times seems reasonable (there is no good reason to let the reader go too far ahead of the authors, so you can even go with the capacity in twice as many threads).
  • Instead of NIO, use the basic BufferedInputStream and BufferedOutputStreams . Your problem here is system calls, not memory speed (NIO classes are designed to prevent copying between heap memory and memory).

I assume that you already know that you are not trying to save all the files in one directory. Or even store more than several hundred files in one directory.

And as another alternative, did you consider S3 for storage? I assume that its keys in the form of a bucket are much more efficient than the actual directories, and there is a file system that allows you to access the buckets as if they were files (I haven’t tried it myself).

+1
source

If I understand your code correctly, you split / write 300M files into small pieces (" fiveToTenKBytes ").

Suppose to use a flow approach .

If you're writing to disk, consider wrapping an OutputStream with a BufferedOutputStream.

eg. sort of:

 try (BufferedOutputStream bos = new BufferedOutputStream(Files.newOutputStream(Paths.getPath(filePathString), StandardOpenOption.CREATE))){ ... } 
+2
source

All Articles