Reading a file into multiple byte arrays

I have an encryption algorithm (AES) that takes a file converted to a massive byte and encrypts it. Since I am going to process very large files, the JVM may disappear from memory. I plan to read files in arrays with several bytes, each of which contains some part of the file. Then I iteratively feed the algorithm. Finally, I combine them to create an encrypted file.

So my question is: is there a way to read part of a file in parts into several byte arrays?

I thought that to read the file into an array of bytes I can use the following:

IOUtils.toByteArray(InputStream input). 

Then divide the array into several bytes using:

  Arrays.copyOfRange() 

But I'm afraid that code reading a ByteArray file will cause the JVM to exit memory.

+6
source share
2 answers

Look at the encryption streams in Java. You can use them to encrypt / decrypt streams on the fly, so you do not need to store all this in memory. All you have to do is copy the regular FileInputStream for your source file to CipherOutputStream , which completes your FileOutputStream for the encrypted shell file. IOUtils even conveniently contains a copy(InputStream, OutputStream) method copy(InputStream, OutputStream) to make this copy for you.

For instance:

 public static void main(String[] args) { encryptFile("exampleInput.txt", "exampleOutput.txt"); } public static void encryptFile(String source, String sink) { FileInputStream fis = null; try { fis = new FileInputStream(source); CipherOutputStream cos = null; try { cos = new CipherOutputStream(new FileOutputStream(sink), getEncryptionCipher()); IOUtils.copy(fis, cos); } finally { if (cos != null) cos.close(); } } finally { if (fis != null) fis.close(); } } private static Cipher getEncryptionCipher() { // Create AES cipher with whatever padding and other properties you want Cipher cipher = ... ; // Create AES secret key Key key = ... ; cipher.init(Cipher.ENCRYPT_MODE, key); } 

If you need to know the number of bytes copied, you can use IOUtils.copyLarge instead of IOUtils.copy if the file sizes exceed Integer.MAX_VALUE bytes (2 GB).

To decrypt the file, do the same, but use CipherInputStream instead of CipherOutputStream and initialize your Cipher with Cipher.DECRYPT_MODE .

Look here for more information on Java encryption streams.

This will save you space because you no longer need to store your own byte arrays. The only saved byte[] on this system is the internal byte[] Cipher , which will be cleared every time enough input is entered, and the encrypted block is returned to Cipher.update or to Cipher.doFinal when CipherOutputStream closed. However, you do not need to worry about any of this, as it is all internally and everything is managed for you.

Edit: Please note that this can lead to ignoring some encryption exceptions, especially BadPaddingException and IllegalBlockSizeException . This can be found in the CipherOutputStream source code . (Of course, this source is from OpenJDK, but it probably does the same in Sun JDK.) Also, from CipherOutputStream javadocs

This class adheres strictly to semantics, especially the semantics of failure, its predecessor classes java.io.OutputStream and java.io.FilterOutputStream . This class has exactly the methods that are specified in its ancestor classes, and overrides them all. In addition, this class catches all exceptions that are not thrown by its ancestor classes.

The bold line here means that cryptographic exceptions are ignored, what they are. This can lead to unexpected behavior when trying to read an encrypted file, especially for block encryption algorithms and / or add-ons such as AES. Please note that you will get zero or partial output for the encrypted (or decrypted CipherInputStream file).

+5
source

If you are using IOUtils , perhaps you should consider IOUtils.copyLarge ()

 public static long copyLarge(InputStream input, OutputStream output, long inputOffset, long length) 

and specify ByteArrayOutputStream as the output. Then you can iterate and load sections of your file using offset / length.

From the doc:

Copy some or all of the bytes from a large (over 2 GB) InputStream into an OutputStream, optionally skipping the input bytes.

+1
source

All Articles