Support for python 2 and 3: str, bytes or alternative

I have a Python2 code base that makes extensive use of str to store raw binary data. I want to support both Python2 and Python3.

The type bytes (ais of str ) in Python2 and the bytes in Python3 are completely different. They use different arguments for building, indexing for different types and have different str and repr .

What is the best way to unify code for both versions of Python using one type to store raw data?

+6
source share
3 answers

I don’t know in which areas you want to work with bytes, I almost always work with bytearray, and that’s how I do it when reading from a file

 with open(file, 'rb') as imageFile: f = imageFile.read() b = bytearray(f) 

I took this directly from the project I'm working on, and it works in both 2 and 3. Perhaps there is something for you to watch?

0
source

If your project is small and simple, use six .

Otherwise, I suggest having two independent codes: one for Python 2 and one for Python 3. Initially, this may sound like a lot of unnecessary work, but in the end it is actually much easier to maintain.

As an example of what your project might be, if you decide to support both pythons in one code base, look at google protobuf . Many often conflicting branches throughout the code, abstractions that were changed just to allow hacks. And since your project will develop, it will not improve: deadlines play against the quality of the code.

With two separate code bases, you simply apply almost the same fixes, which do not work much compared to what lies ahead if you want to create a single code base. And it will be easier to migrate to Python 3 completely as soon as the number of Python 2 users of your package drops.

0
source

Assuming you only need to support Python 2.6 and later, you can just use bytes for, well, bytes. Use b literals to create objects with bytes, for example b'\x0a\x0b\x00' . When working with files, make sure that the mode turns on b (as in open('file.bin', 'rb') ).
Beware that iteration and access to elements are different. In these cases, you can write your code to use chunks. Instead of b[0] == 0 (Python 3) or b[0] == b'\x00' (Python 2) write b[0:1] == b'\x00' . Other options use bytearray (when bytes are mutable) or helper functions.

Character strings must be unicode in Python 2, regardless of porting to Python 3; otherwise, the code is likely to be incorrect if it encounters non-ASCII characters. Equivalent to str in Python 3.
Either use u literals to create character strings (e.g. u'Düsseldorf' ) and / or be sure to run each file with from __future__ import unicode_literals . Declare file encodings, when necessary, by running files using # encoding: utf-8 .
Use io.open to read character strings from files. For network code, select bytes and call decode to get a character string.

If you need to support Python 2.5 or 3.2, see six for literal conversion.

Add a lot of statements to make sure that functions that work with character strings do not receive bytes, and vice versa. As usual, a good test suite with 100% coverage helps.

0
source

All Articles