Python 2.7 string.join () with unicode

I have a set of byte strings ( str , not unicode , in python 2.7) containing Unicode data ( utf-8 encoded).

I try to join them (via "".join(utf8_strings) or u"".join(utf8_strings) ), which throws

 UnicodeDecodeError: 'ascii' codec can't decode byte 0xec in position 0: ordinal not in range(128)` 

Is it possible to use the .join() method for strings without ascii? Of course, I can combine them in a for loop, but it will not be cost-effective.

+8
python unicode
source share
2 answers

Combining byte strings using ''.join() works just fine; the error you see appears only if you mixed unicode and str objects:

 >>> utf8 = [u'\u0123'.encode('utf8'), u'\u0234'.encode('utf8')] >>> ''.join(utf8) '\xc4\xa3\xc8\xb4' >>> u''.join(utf8) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128) >>> ''.join(utf8 + [u'unicode object']) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128) 

The above exceptions occur when using the Unicode u'' value as a joiner and adding a Unicode string to the list of strings to join, respectively.

+14
source share

"".join(...) will work if each parameter is str (no matter what encoding it may be).

The problem you see is probably not related to the connection, but the data that you provide to them. Post more code so we can see what is really wrong.

+2
source share

All Articles