Python 3: How to get a string literal representation of a byte string?

Question

Python 3: How to get a string literal representation of a byte string?

In Python 3, how can I interpolate a byte string into a regular string and get the same behavior as Python 2 (i.e.: get only escape codes without the b prefix or double backslash)?

eg:.

Python 2.7:

 >>> x = u'\u041c\u0438\u0440'.encode('utf-8') >>> str(x) '\xd0\x9c\xd0\xb8\xd1\x80' >>> 'x = %s' % x 'x = \xd0\x9c\xd0\xb8\xd1\x80'

Python 3.3:

 >>> x = u'\u041c\u0438\u0440'.encode('utf-8') >>> str(x) "b'\\xd0\\x9c\\xd0\\xb8\\xd1\\x80'" >>> 'x = %s' % x "x = b'\\xd0\\x9c\\xd0\\xb8\\xd1\\x80'"

Note that with Python 3, I get the b prefix in my output and double underscores. The result I would like to get is the result that I get in Python 2.

+6

python python-3.x escaping

Marc abramowitz Mar 13 '13 at 16:02

source share

3 answers

In your Python 3 example, you are interpolating a Unicode string, not a byte string, as you do in Python 2.

In Python 3, bytes do not support interpolation (string formatting or something-you).

Either concatenation or using Unicode in everything and only encoding during interpolation:

 b'x = ' + x

or

 'x = {}'.format(x.decode('utf8')).encode('utf8')

or

 x = '\u041c\u0438\u0440' # the u prefix is ignored in Python 3.3 'x = {}'.format(x).encode('utf8')

+3

Martijn pieters Mar 13 '13 at 16:08

source share

In Python 2, byte strings and regular strings are the same, so str() conversion is not performed. In Python 3, a string is always a Unicode string, so str() a byte string does the conversion.

Instead, you can do your own conversion, which does what you want:

 x2 = ''.join(chr(c) for c in x)

0

Mark ransom Mar 13 '13 at 16:12

source share

javex · Accepted Answer · 2013-03-13T16:12:06+0000

In Python 2, you have str and unicode types. str is a simple byte string, and unicode is a Unicode string.

For Python 3, this has changed: now str is what was unicode in Python 2 and byte is what str in Python 2.

So, when you do ("x = %s" % '\u041c\u0438\u0440').encode("utf-8") , you can actually omit the u prefix because it is implicit. All that is not explicitly converted to python is unicode.

This will give your last line in Python 3:

  ("x = %s" % '\u041c\u0438\u0440').encode("utf-8")

Now, as I encode after the final result, what you should always do: take the incoming object, decode it in unicode (how you do it), and then, making a conclusion, encode it in the encoding of your choice. Do not try to process source byte strings. This is just ugly and obscene behavior.

Python 3: How to get a string literal representation of a byte string?

More articles: