Slow string concatenation at large input

I wrote an n-ary ADT tree that works fine. However, I need to save the serialization in a variable of the calling class. eg.

DomTree<String> a = Data.createTreeInstance("very_large_file.xml"); String x = a.toString(); 

I wrote a method that serves the purpose of exactly how I need it, but on the very large inputs that it takes forever (20 minutes per 100 MB xml file). I timed the methods and built the tree from the XML file quickly, but the call toString (), as shown above, is very slow.

 @Override public String toString(){ return printTree(this); } public String printTree(AbstractTree<E> tree){ if (tree.isLeaf()){ return tree.getNodeName(); }else{ String tStr = tree.getNodeName() + "("; int i = 0; Iterator<AbstractTree<E>> child = tree.getChildren().iterator(); while (i < tree.getChildren().size() - 1){ tStr += printTree(child.next()) + ", "; i++; } tStr += printTree(child.next()) + ")"; return tStr; } } 

I assume this is due to the way the lines are built, and not how the tree goes? Is there a better way to do this?

UPDATE. Following the Skaffman example, the following code gives outOfMemoryError for very large input.

 @Override public String toString(){ StringBuilder buffer = new StringBuilder(); printTree(this, buffer); return buffer.toString(); 

}

 public String printTree(AbstractTree<E> tree, StringBuilder buffer){ if (tree.isLeaf()){ return tree.getNodeName(); }else{ buffer.append(tree.getNodeName()); buffer.append("("); int i = 0; Iterator<AbstractTree<E>> child = tree.getChildren().iterator(); while (i < tree.getChildren().size() - 1){ buffer.append(printTree(child.next(), buffer)); buffer.append(", "); i++; } buffer.append(printTree(child.next(), buffer)); buffer.append(")"); return buffer.toString(); } } 

UPDATE: now works fine using the Skaffmans example

+6
java optimization string-concatenation
source share
6 answers

String concats like this are punitively slow. Use StringBuilder.

 @Override public String toString(){ StringBuilder buffer = new StringBuilder(); printTree(this, buffer); return buffer.toString(); } public void printTree(AbstractTree<E> tree, StringBuilder buffer){ if (tree.isLeaf()){ buffer.append(tree.getNodeName()); } else { buffer.append(tree.getNodeName()); buffer.append("("); int i = 0; Iterator<AbstractTree<E>> child = tree.getChildren().iterator(); while (i < tree.getChildren().size() - 1){ printTree(child.next(), buffer); buffer.append(", "); i++; } printTree(child.next(), buffer); buffer.append(")"); } } 
+15
source share

Do not use string concatenation in loops. It does not scale.

Use StringBuilder, this does not create new objects all the time, for example, string concatenation.

 void print() { StringBuilder sb = new StringBuilder(); sb.append("hello"); sb.append(" World!"); System.out.println(sb.toString()); 

}

+4
source share

Look at StringBuilder, don't use simple concatenation and pass StringBuilder through the whole process (or make it global).

+3
source share

Let me tell you why string concatenation is slow because strings are immutable. This means that every time you write "+ =", a new line is created. This means that you are creating your worst-case string, O (n 2 ). This is because if you + = 'ed 1 char at a time, the cost of creating a new line will be 2 + 3 + 4 + ... + n, which is equal to O (n 2 ).

Use StringBuilder, like other sentences (a slower, but thread-safe StringBuffer).

I suppose I should add, StringBuilder will give you O (n) amortized time, because it works like a vector behind the scenes, as it is modified. So create your line there and then call toString ().

 StringBuilder builder = new StringBuilder(); builder.append("blah"); // append more as needed. String text = builder.toString(); 

I would also add that this problem is similar in Python. The idiom in python is to add all your strings to concatenation in a list, and then join the list. "".join(the_list) .

UPDATE: As Bill notes, concatenation is not the root of all evil. One of the string concatenations is good, and can even be optimized! (They are also linear in the worst case). But, when you concatenate in a loop because you are higher, performance will change dramatically as the number of iterations increases. In this case, my aforementioned analysis is flawless, as I specifically stated that this is the “worst case”, which means that you do not expect any optimizations. (That the JVM cannot even optimize the concatenation in loops, and can also go outside).

+3
source share

If the profiler confirms that the bottleneck is string concatenation, you have two options:

  • StringBuilder / StringBuffer (the latter is better for streaming)
  • Ropes for Java :

Rope is a high-performance replacement for Strings. The datastructure described in detail in the section “Ropes: an alternative to strings” provides asymptotically better performance than String and StringBuffer for ordinary string modifications, such as adding, adding, deleting, and pasting. Like strings, ropes are immutable and therefore well suited for multi-threaded programming.

+2
source share

You can see String.intern () as a way to reduce memory usage. This will use the interned row from the row pool. If you have many duplicate lines, this might be faster. More info on interned strings here

-one
source share

All Articles