The correct way to iterate a zip file
final ZipFile file = new ZipFile( FILE_NAME ); try { final Enumeration<? extends ZipEntry> entries = file.entries(); while ( entries.hasMoreElements() ) { final ZipEntry entry = entries.nextElement(); System.out.println( entry.getName() ); //use entry input stream: readInputStream( file.getInputStream( entry ) ) } } finally { file.close(); } private static int readInputStream( final InputStream is ) throws IOException { final byte[] buf = new byte[ 8192 ]; int read = 0; int cntRead; while ( ( cntRead = is.read( buf, 0, buf.length ) ) >=0 ) { read += cntRead; } return read; }
A zip file consists of several records, each of which has a field containing the number of bytes in the current record. Thus, iterating over all zip file entries is easy without actually decompressing the data. java.util.zip.ZipFile accepts a file / file name and uses random access to jump between file positions. java.util.zip.ZipInputStream, on the other hand, works with streams, so it cannot jump freely. This is why it must read and unzip all zip data in order to achieve EOF for each record and read the header of the next record.
What does it mean? If you already have a zip file in your file system - use ZipFile to process it regardless of your task. As a bonus, you can access zip records both sequentially and randomly (with a rather slight decrease in performance). On the other hand, if you are processing a stream, you need to process all the records sequentially using ZipInputStream.
Here is an example. A zip archive (total file size = 1.6 GB) containing three 0.6 GB records was repeated in 0.05 seconds using a ZipFile and in 18 seconds using a ZipInputStream.
source share