Copy binary data from a URL to a file in Java without an intermediate copy

I am updating some old code to capture some binary data from a URL, not from a database (the data will be moved from the database and will be accessible via HTTP instead). It seems that the database API provided the data as an array of raw bytes, and this code wrote this array to a file using BufferedOutputStream.

I am not completely familiar with Java, but a bit on a search query led me to this code:

URL u = new URL("my-url-string"); URLConnection uc = u.openConnection(); uc.connect(); InputStream in = uc.getInputStream(); ByteArrayOutputStream out = new ByteArrayOutputStream(); final int BUF_SIZE = 1 << 8; byte[] buffer = new byte[BUF_SIZE]; int bytesRead = -1; while((bytesRead = in.read(buffer)) > -1) { out.write(buffer, 0, bytesRead); } in.close(); fileBytes = out.toByteArray(); 

This works most of the time, but I have a problem when the copied data is large - I get an OutOfMemoryError for data items that work fine with old code.

I assume that since this version of the code has several copies of the data in memory at the same time, while the source code did not.

Is there an easy way to capture binary data from a URL and store it in a file without resorting to the cost of several copies in memory?

+4
source share
4 answers

Instead of writing data to an array of bytes and then dumping them into a file, you can directly write it to a file, replacing the following:

 ByteArrayOutputStream out = new ByteArrayOutputStream(); 

WITH

 FileOutputStream out = new FileOutputStream("filename"); 

If you do this, there is no need to call out.toByteArray() at the end. Just make sure you close the FileOutputStream object when you're done, for example:

 out.close(); 

See the FileOutputStream documentation for more details .

+12
source

I don't know what you mean with "big" data, but try using the JVM parameter

java -Xmx 256m ...

which sets the maximum heap size to 256 MB (or whatever value you like).

+1
source

If you need the length of the content, and your web server is a little standard, then it should provide you with a "Content-Length" header.

URLConnection # getContentLength () should provide you this information in advance so that you can create your file. (Remember that if your HTTP server is not configured correctly or is under the control of an evil object, this header may not correspond to the number of bytes received. In this case, why don’t you go to the temporary file first and copy this file later?)

In addition to this: ByteArrayInputStream is a terrible memory allocator. It always doubles the size of the buffer, so if you read a 32 megabyte + 1 byte file, then you get a 64 megabyte buffer. It might be better to implement a proprietary, smarter stream of byte arrays, like this one:

http://source.pentaho.org/pentaho-reporting/engines/classic/trunk/core/source/org/pentaho/reporting/engine/classic/core/util/MemoryByteArrayOutputStream.java

+1
source

subclassing ByteArrayOutputStream gives you access to the buffer and the number of bytes in it.

But of course, if all you want to do is store the data in a file, you'd better use FileOutputStream.

0
source

All Articles