How to deal with large lines and limited memory

I have a file from which I am reading data. All text from this file is stored in the String variable (a very large variable). Then, in another part of my application, I want to go through this line and extract step-by-step information (parsing the line).

At that time, my memory is full, and the OutOfMemory exception prevents me from further processing. I think it would be better to process the data directly when reading the input stream from a file. But for organizational purposes, I would like to pass String to another part of my application.

What to do to prevent memory overflow?

+6
java string memory out-of-memory
source share
4 answers

You should use the BufferedInputReader instead of storing it all in one big line.

If what you want to parse appears on the same line, then the StringTokenizer will work very well, otherwise you must develop a way to read what you want from the file to parse the statements, and then apply the StringTokenizer to each statement.

+7
source share

If you can slightly reduce your requirements, you can implement java.lang.CharSequence supported by your file.

CharSequence is supported in many places in the JDK (A String - CharSequence). So this is a good alternative to a Reader based implementation.

+6
source share

Others offer reading and processing parts of your file at a time. If possible, one of these methods would be better.

However, if this is not possible, and you can load the String initially into memory, as you specify, but later it parses that string, which creates problems, you can use substrings. In Java, a substring is displayed on top of the original char array and just takes the memory for the Object base, and then the start and int pointers.

So, when you find the part of the line that you want to keep separately, use something like:

 String piece = largeString.substring(foundStart, foundEnd); 

If you instead or the code that does this inside, then memory usage will increase dramatically:

 new String(largeString.substring(foundStart, foundEnd)); 

Note that you should use String.substring() with caution for this very reason. You may have a very large string from which you take a substring, and then drop the reference to the original string. The problem is that the substring is still referencing the original large char array. The GC will not release it until the substring is also removed. In such cases, it is useful to use new String(...) to ensure that an unused large array will be discarded by GC (this is one of the few cases where you should use new String(...) ).

Another method, if you expect to have many small lines, and they are likely to have the same value, but from an external source (such as a file) should use .intern() after creating a new line.

Note. It depends on the String implementation, which you really should not know about, but in practice for large applications you sometimes have to rely on this knowledge. Keep in mind that future versions of Java may change this (albeit unlikely).

+4
source share

You should look at your algorithm for processing big data. You must process pieces of data, or use random access to files without storing data in memory. For example, you can use StringTokenizer or StreamTokenizer, as @Zombies said. You can see the methods of the parser-lexer: when the parser parses some expression, it asks lexer to read the next token (tokens), but does not immediately read the entire input stream.

+1
source share

All Articles