Random Access Zip File without Burning to Disk

I have a 1-2GB zip file with records 500-1000k. I need to get files by name in a split second, without fully unpacking. If the file is stored on the hard drive, this works fine:

public class ZipMapper { private HashMap<String,ZipEntry> map; private ZipFile zf; public ZipMapper(File file) throws IOException { map = new HashMap<>(); zf = new ZipFile(file); Enumeration<? extends ZipEntry> en = zf.entries(); while(en.hasMoreElements()) { ZipEntry ze = en.nextElement(); map.put(ze.getName(), ze); } } public Node getNode(String key) throws IOException { return Node.loadFromStream(zf.getInputStream(map.get(key))); } } 

But what should I do if the program downloaded a zip file from Amazon S3 and has its own InputStream (or an array of bytes)? When loading, 1 GB takes ~ 1 second, recording to the hard drive may take some time, and it is a little more difficult to process several files, since we do not have a garbage collector on the hard drive.

ZipInputStream does not allow random access to records.

It would be nice to create a virtual file in memory by byte array, but I could not find a way.

+6
source share
5 answers

You can mark the file that will be deleted upon exit.

If you want to use a memory approach: take a look at the new NIO file API. 2. Oracle provides a file system provider for zip / jar and AFAIK ShrinkWrap provides a file system in memory. You can try a combination of the two.

I wrote some utilities for copying directories and files to / from a Zip file using the NIO.2 File API (Open Source Library):

Maven:

 <dependency> <groupId>org.softsmithy.lib</groupId> <artifactId>softsmithy-lib-core</artifactId> <version>0.3</version> </dependency> 

Textbook:

http://softsmithy.sourceforge.net/lib/current/docs/tutorial/nio-file/index.html

API: CopyFileVisitor.copy

Especially PathUtils.resolve helps with resolving paths through file systems.

+1
source

You can use the SecureBlackbox library, it allows you to perform ZIP operations on any search streams.

+1
source


I think you should consider using your OS to create an in-memory file system (that is, a RAM disk).
Also, check out the FileSystem API.

0
source

A completely different approach: if the server has a file on disk (and, possibly, it is already cached in RAM): make it provide files directly. In other words, send the files you need, and then take care of their extraction and delivery to the server.

0
source

The Blackbox library has an Extract method (String name, String outputPath). It seems that it can accidentally access any file in the search zip stream, but it cannot write the result to a byte array or a reverse stream.

I could not find the documentation for ShrinkWrap either. I could not find a suitable implementation of FileSystem / FileSystemProvider, etc.

However, it turned out that the Amazon EC2 instance that I am running (large) somehow writes a 1gb file to disk after ~ 1 second. So I just write the file to disk and use ZipFile.

If the HDD is slow, I think the RAM disk will be the easiest solution.

0
source

All Articles