Python equivalent of Java StringBuffer?

Is there anything in Python like Java StringBuffer ? Since strings are also immutable in Python, editing them in loops will be inefficient.

+64
java python stringbuffer
Nov 12 '13 at 10:00
source share
7 answers

Efficient string concatenation in Python is a rather old article and its main assertion is that naive concatenation is much slower than concatenation no longer works, since this part has been optimized in CPython since:

Details of the CPython implementation: if s and t are both strings, some Python implementations, such as CPython, can usually perform in-place optimizations for assignments of the form s = s + t or s + = t. When applicable, this optimization makes quadratic run time much less likely. This optimization depends on both version and implementation. For performance-sensitive code, it is preferable to use the str.join () method, which provides consistent linear concatenation performance for versions and implementations. @ http://docs.python.org/2/library/stdtypes.html

I adapted my code a bit and got the following results on my machine:

 from cStringIO import StringIO from UserString import MutableString from array import array import sys, timeit def method1(): out_str = '' for num in xrange(loop_count): out_str += `num` return out_str def method2(): out_str = MutableString() for num in xrange(loop_count): out_str += `num` return out_str def method3(): char_array = array('c') for num in xrange(loop_count): char_array.fromstring(`num`) return char_array.tostring() def method4(): str_list = [] for num in xrange(loop_count): str_list.append(`num`) out_str = ''.join(str_list) return out_str def method5(): file_str = StringIO() for num in xrange(loop_count): file_str.write(`num`) out_str = file_str.getvalue() return out_str def method6(): out_str = ''.join([`num` for num in xrange(loop_count)]) return out_str def method7(): out_str = ''.join(`num` for num in xrange(loop_count)) return out_str loop_count = 80000 print sys.version print 'method1=', timeit.timeit(method1, number=10) print 'method2=', timeit.timeit(method2, number=10) print 'method3=', timeit.timeit(method3, number=10) print 'method4=', timeit.timeit(method4, number=10) print 'method5=', timeit.timeit(method5, number=10) print 'method6=', timeit.timeit(method6, number=10) print 'method7=', timeit.timeit(method7, number=10) 

Results:

 2.7.1 (r271:86832, Jul 31 2011, 19:30:53) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] method1= 0.171155929565 method2= 16.7158739567 method3= 0.420584917068 method4= 0.231794118881 method5= 0.323612928391 method6= 0.120429992676 method7= 0.145267963409 

Findings:

  • join still wins over concat, but only marginally
  • list recognition faster than loops
  • Combining generators is slower than combining lists
  • other methods are useless (unless you are doing something special)
+68
Nov 12 '13 at
source share

Maybe use bytearray :

 In [1]: s = bytearray('Hello World') In [2]: s[:5] = 'Bye' In [3]: s Out[3]: bytearray(b'Bye World') In [4]: str(s) Out[4]: 'Bye World' 

The appeal of using bytearray lies in its memory efficiency and convenient syntax. It can also be faster than using a temporary list:

 In [36]: %timeit s = list('Hello World'*1000); s[5500:6000] = 'Bye'; s = ''.join(s) 1000 loops, best of 3: 256 µs per loop In [37]: %timeit s = bytearray('Hello World'*1000); s[5500:6000] = 'Bye'; str(s) 100000 loops, best of 3: 2.39 µs per loop 

Note that most of the speed difference is related to creating the container:

 In [32]: %timeit s = list('Hello World'*1000) 10000 loops, best of 3: 115 µs per loop In [33]: %timeit s = bytearray('Hello World'*1000) 1000000 loops, best of 3: 1.13 µs per loop 
+11
Nov 12 '13 at
source share

Depends on what you want to do. If you need a modified sequence, the built-in list type is your friend, and moving from str to a list and back is as easy as:

  mystring = "abcdef" mylist = list(mystring) mystring = "".join(mylist) 

If you want to build a large line using the for loop, the pythonic way is usually to create a list of lines and then combine them together with the appropriate separator (linebreak or whatever).

In addition, you can also use some system of text templates, or a parser or any other specialized tool that is most suitable for work.

+11
Nov 12 '13 at
source share

Previously provided answers are almost always better. However, sometimes a string is created in many method calls and / or loops, so it is not necessary to naturally create a list of strings and then join them. And since there is no guarantee that you are using CPython or that CPython optimization will be applied, then another approach is to just use printing!

Here is an example of a helper class, although the helper class is trivial and probably not needed, it serves to illustrate the approach (Python 3):

 import io class StringBuilder(object): def __init__(self): self._stringio = io.StringIO() def __str__(self): return self._stringio.getvalue() def append(self, *objects, sep=' ', end=''): print(*objects, sep=sep, end=end, file=self._stringio) sb = StringBuilder() sb.append('a') sb.append('b', end='\n') sb.append('c', 'd', sep=',', end='\n') print(sb) # 'ab\nc,d\n' 
+5
Mar 12 '15 at 1:20
source share

this link may be useful for concatenation in python

http://pythonadventures.wordpress.com/2010/09/27/stringbuilder/

example from the link above:

 def g(): sb = [] for i in range(30): sb.append("abcdefg"[i%7]) return ''.join(sb) print g() # abcdefgabcdefgabcdefgabcdefgab 
+2
Nov 12 '13 at
source share

Just a test that I run on python 3.6.2 showing that "join" still wins BIG!

 from time import time def _with_format(i): _st = '' for i in range(0, i): _st = "{}{}".format(_st, "0") return _st def _with_s(i): _st = '' for i in range(0, i): _st = "%s%s" % (_st, "0") return _st def _with_list(i): l = [] for i in range(0, i): l.append("0") return "".join(l) def _count_time(name, i, func): start = time() r = func(i) total = time() - start print("%s done in %ss" % (name, total)) return r iterationCount = 1000000 r1 = _count_time("with format", iterationCount, _with_format) r2 = _count_time("with s", iterationCount, _with_s) r3 = _count_time("with list and join", iterationCount, _with_list) if r1 != r2 or r2 != r3: print("Not all results are the same!") 

And the result was:

 with format done in 17.991968870162964s with s done in 18.36879801750183s with list and join done in 0.12142801284790039s 
+2
Sep 13 '17 at 9:15
source share

I added additional tests to Roee Gavirel's code 2 that convincingly show that combining lists into strings is no faster than s + = "something".

Results:

 Python 2.7.15rc1 Iterations: 100000 format done in 0.317540168762s %s done in 0.151262044907s list+join done in 0.0055148601532s str cat done in 0.00391721725464s Python 3.6.7 Iterations: 100000 format done in 0.35594654083251953s %s done in 0.2868080139160156s list+join done in 0.005924701690673828s str cat done in 0.0054128170013427734s f str done in 0.12870001792907715s 

The code:

 from time import time def _with_cat(i): _st = '' for i in range(0, i): _st += "0" return _st def _with_f_str(i): _st = '' for i in range(0, i): _st = f"{_st}0" return _st def _with_format(i): _st = '' for i in range(0, i): _st = "{}{}".format(_st, "0") return _st def _with_s(i): _st = '' for i in range(0, i): _st = "%s%s" % (_st, "0") return _st def _with_list(i): l = [] for i in range(0, i): l.append("0") return "".join(l) def _count_time(name, i, func): start = time() r = func(i) total = time() - start print("%s done in %ss" % (name, total)) return r iteration_count = 100000 print('Iterations: {}'.format(iteration_count)) r1 = _count_time("format ", iteration_count, _with_format) r2 = _count_time("%s ", iteration_count, _with_s) r3 = _count_time("list+join", iteration_count, _with_list) r4 = _count_time("str cat ", iteration_count, _with_cat) r5 = _count_time("f str ", iteration_count, _with_f_str) if len(set([r1, r2, r3, r4, r5])) != 1: print("Not all results are the same!") 
+1
Apr 04 '19 at 22:31
source share



All Articles