Memory issues with String.split ()

My programs currently have memory problems, and after checking the application, we found that the String.split() method uses a lot of memory. I tried using StreamTokenizer , but it seems to make things even more complicated.

Is there a better way to split a long Strings into a small Strings that uses less memory than the String.split() method?

+4
source share
4 answers

It is very unlikely that any realistic use of the split would "consume a lot of memory." Your input must be huge (many, many megabytes), and your result will be divided into many millions of parts to even be noticed.

Here is some code that creates a random string of approximately 1.8 million characters and breaks it into more than 1 million lines and displays the used memory and time.

As you can see, this is not so much: 61Mb is consumed in just 350 ms.

 public static void main(String[] args) throws Exception { StringBuilder sb = new StringBuilder(); for (int i = 0; i < 99999; i++) { sb.append(Math.random()); } long begin = System.currentTimeMillis(); String string = sb.toString(); sb = null; System.gc(); long startFreeMem = Runtime.getRuntime().freeMemory(); String[] strings = string.split("(?=[0-5])"); long endFreeMem = Runtime.getRuntime().freeMemory(); long execution = System.currentTimeMillis() - begin; System.out.println("input length = " + string.length() + "\nnumber of strings after split = " + strings.length + "\nmemory consumed due to split = " + (startFreeMem - endFreeMem) + "\nexecution time = " + execution + "ms"); } 

Exit (performed on a fairly typical window window):

 input length = 1827035 number of strings after split = 1072788 memory consumed due to split = 71740240 execution time = 351ms 

Interestingly, without System.gc() , the memory used was about 1/3:

 memory consumed due to split = 29582328 
+1
source

Split does not create new lines, it uses substring internally, which creates a new String object that points to the right substring of the original string without copying the underlying char[] .

Thus, in addition to the (insignificant) overhead of creating an object, it should not have much impact on the memory perspective.

ps: StringTokenizer uses the same method, so it will probably give the same results as split.

EDIT

To make sure this is the case, you can use the sample code below. It splits abc,def into abc and def , then prints the base char[] source line and the split lines - the output shows that they are all the same.

Output:

 Reference: [ C@3590ed52 Content: [a, b, c, ,, d, e, f] Reference: [ C@3590ed52 Content: [a, b, c, ,, d, e, f] Reference: [ C@3590ed52 Content: [a, b, c, ,, d, e, f] 

code:

 public static void main(String[] args) throws InterruptedException, NoSuchFieldException, IllegalArgumentException, IllegalAccessException { String s = "abc,def"; String[] ss = s.split(","); Field f = String.class.getDeclaredField("value"); f.setAccessible(true); System.out.println("Reference: " + f.get(s) + "\tContent: " + Arrays.toString((char[])f.get(s))); System.out.println("Reference: " + f.get(ss[0]) + "\tContent: " + Arrays.toString((char[])f.get(ss[0]))); System.out.println("Reference: " + f.get(ss[1]) + "\tContent: " + Arrays.toString((char[])f.get(ss[1]))); } 
0
source

Partitioning is possible with aspect memory if you just want to use one or more arrays of long string. a long string will always be in memory. as

 private static List<String> headlist = new ArrayList<String>(); String longstring = "....."; headlist.add(longstring.split(" ")[0]); 

than longstring will always be in memory. JVM cannot use it.

in this situation, I think maybe you can try

 private static List<String> headlist = new ArrayList<String>(); String longstring = "....."; headlist.add(new String(longstring.split(" ")[0])); 

as following code

 import java.util.ArrayList; import java.util.List; import java.util.Random; public class SplitTest { static Random rand = new Random(); static List<String> head = new ArrayList<String>(); /** * @param args */ public static void main(String[] args) { while(true) { String a = constructLongString(); head.add(a.split(" ")[0]); //1 //head.add(new String(a.split(" ")[0])); //2 if (i % 1000 == 0) System.out.println("" + i); System.gc(); } } private static String constructLongString() { StringBuilder sb = new StringBuilder(); for (int i = 0; i < 10; i++) { sb.append(rand.nextInt(10)); } sb.append(" "); for (int i = 0; i < 4096; i++) { sb.append(rand.nextInt(10)); } return sb.toString(); } } 

if you work with -Xmx60M, it will be outofmemory about 6000+ and if you use line of code 2, comment line 1, then it works for a long time more than 6000

0
source

You need to use some kind of stream reader and not abuse the memory with a large row of data. here is an example:

  public static void readString(String str) throws IOException { InputStream is = new ByteArrayInputStream(str.getBytes("UTF-8")); char[] buf = new char[2048]; Reader r = new InputStreamReader(is, "UTF-8"); while (true) { int n = r.read(buf); if (n < 0) break; /* StringBuilder s = new StringBuilder(); s.append(buf, 0, n); ... now you can parse the StringBuilder ... */ } } 
0
source

All Articles