Java: saving a large map in resources

I need to use a large file containing String, String strings, and since I want to send it using the JAR, I decided to include the serialized and gzipped version in the application resource folder. This is how I created serialization:

ObjectOutputStream out = new ObjectOutputStream( new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(OUT_FILE_PATH, false)))); out.writeObject(map); out.close(); 

I decided to use HashMap<String,String> , the resulting file is 60 MB, and the map contains about 4 million records.

Now that I need a card and I deserialize it using:

 final InputStream in = FileUtils.getResource("map.ser.gz"); final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(in))); map = (Map<String, String>) ois.readObject(); ois.close(); 

It takes about 10 ~ 15 seconds. Is there a better way to store such a large card in a JAR? I ask because I also use the Stanford CoreNLP library, which itself uses large model files, but seems to work better in this regard. I tried to find the code in which the model files would be read, but gave up.

+5
source share
3 answers

Your problem is that you pinned the data. Save it as plain text.

The most likely hit will be the flow. Mailboxes are already archived, so there is no place to save the file that was archived.

Basically:

  • Save the file in text format
  • Use Files.lines(Paths.get("myfilenane.txt")) to stream strings
  • Consume every line with minimal code

Something like this if the data is in the form key=value (for example, a properties file):

 Map<String, String> map = new HashMap<>(); Files.lines(Paths.get("myfilenane.txt")) .map(s -> s.split("=")) .forEach(a -> map.put(a[0], a[1])); 

Disclaimer: the code cannot compile or work because it was downloaded on my phone (but there is a reasonable chance that it will work)

+1
source

What you can do is apply a technique from the Java Performance book : the final guide from Scott Oaks , which actually stores the compressed content of an object in an array of bytes, so for this we need a wrapper class, which I call MapHolder here:

 public class MapHolder implements Serializable { // This will contain the zipped content of my map private byte[] content; // My actual map defined as transient as I don't want to serialize its // content but its zipped content private transient Map<String, String> map; public MapHolder(Map<String, String> map) { this.map = map; } private void writeObject(ObjectOutputStream out) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); try (GZIPOutputStream zip = new GZIPOutputStream(baos); ObjectOutputStream oos = new ObjectOutputStream( new BufferedOutputStream(zip))) { oos.writeObject(map); } this.content = baos.toByteArray(); out.defaultWriteObject(); // Clear the temporary field content this.content = null; } private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException { in.defaultReadObject(); try (ByteArrayInputStream bais = new ByteArrayInputStream(content); GZIPInputStream zip = new GZIPInputStream(bais); ObjectInputStream ois = new ObjectInputStream( new BufferedInputStream(zip))) { this.map = (Map<String, String>) ois.readObject(); // Clean the temporary field content this.content = null; } } public Map<String, String> getMap() { return this.map; } } 

Then your code will be simple:

 final ByteArrayInputStream in = new ByteArrayInputStream( Files.readAllBytes(Paths.get("/tmp/map.ser")) ); final ObjectInputStream ois = new ObjectInputStream(in); MapHolder holder = (MapHolder) ois.readObject(); map = holder.getMap(); ois.close(); 

As you may have noticed, you no longer loop around the content that it zips inside, and then serializes the MapHolder instance.

0
source

You can consider one of the many fast serialization libraries:

0
source

All Articles