Is there a faster way to convert an arbitrary large integer to a large byte sequence of bytes?

Question

Is there a faster way to convert an arbitrary large integer to a large byte sequence of bytes?

I have this Python code for this:

from struct import pack as _pack def packl(lnum, pad = 1): if lnum < 0: raise RangeError("Cannot use packl to convert a negative integer " "to a string.") count = 0 l = [] while lnum > 0: l.append(lnum & 0xffffffffffffffffL) count += 1 lnum >>= 64 if count <= 0: return '\0' * pad elif pad >= 8: lens = 8 * count % pad pad = ((lens != 0) and (pad - lens)) or 0 l.append('>' + 'x' * pad + 'Q' * count) l.reverse() return _pack(*l) else: l.append('>' + 'Q' * count) l.reverse() s = _pack(*l).lstrip('\0') lens = len(s) if (lens % pad) != 0: return '\0' * (pad - lens % pad) + s else: return s

Converting 2**9700 - 1 to a string of bytes on my machine requires approximately 174 usec. If I'm ready to use the special bit_length method of Python 2.7 and Python 3.x, I can shorten this to 159 μs by pre-allocating the l array to be the exact correct size at the very beginning, and using l[something] = instead of l.append .

Is there anything I can do that will make it faster? This will be used to convert large primes used in cryptography, as well as some (but not many) smaller numbers.

Edit

This is currently the fastest option in Python <3.2, it will take about half the time in either direction as an accepted answer:

 def packl(lnum, padmultiple=1): """Packs the lnum (which must be convertable to a long) into a byte string 0 padded to a multiple of padmultiple bytes in size. 0 means no padding whatsoever, so that packing 0 result in an empty string. The resulting byte string is the big-endian two's complement representation of the passed in long.""" if lnum == 0: return b'\0' * padmultiple elif lnum < 0: raise ValueError("Can only convert non-negative numbers.") s = hex(lnum)[2:] s = s.rstrip('L') if len(s) & 1: s = '0' + s s = binascii.unhexlify(s) if (padmultiple != 1) and (padmultiple != 0): filled_so_far = len(s) % padmultiple if filled_so_far != 0: s = b'\0' * (padmultiple - filled_so_far) + s return s def unpackl(bytestr): """Treats a byte string as a sequence of base 256 digits representing an unsigned integer in big-endian format and converts that representation into a Python integer.""" return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0

In Python 3.2, the int class has the functions to_bytes and from_bytes , which can do this much faster, which is the above method.

+8

optimization python

Omnifarious Dec 05 '10 at 10:10

source share

4 answers

Here is a solution calling the Python / C API via ctypes . It currently uses NumPy, but if NumPy is not an option, this can only be done with ctypes .

 import numpy import ctypes PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray PyLong_AsByteArray.argtypes = [ctypes.py_object, numpy.ctypeslib.ndpointer(numpy.uint8), ctypes.c_size_t, ctypes.c_int, ctypes.c_int] def packl_ctypes_numpy(lnum): a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8) PyLong_AsByteArray(lnum, a, a.size, 0, 1) return a

On my machine, it is 15 times faster than your approach.

Edit:. This code is only used with ctypes and returns a string instead of a NumPy array:

 import ctypes PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray PyLong_AsByteArray.argtypes = [ctypes.py_object, ctypes.c_char_p, ctypes.c_size_t, ctypes.c_int, ctypes.c_int] def packl_ctypes(lnum): a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1) PyLong_AsByteArray(lnum, a, len(a), 0, 1) return a.raw

This is twice as fast again, which corresponds to an acceleration factor of 30 on my machine.

+10

Sven marnach Dec 05 '10 at 10:52

source share

I suppose you really should just use numpy, and I'm sure there is something else for this. Using the array module, you could hack it faster. But I will take it anyway.

IMX, creating a generator and using list comprehension and / or built-in summation is faster than a loop that is added to the list because adding can be done internally. Oh, and the “lstrip” on the big line should be expensive.

In addition, some points of style: special cases are not special enough; and you didn’t seem to receive a note about the new design x if y else z . :) Although we don’t need it.;)

 from struct import pack as _pack Q_size = 64 Q_bitmask = (1L << Q_size) - 1L def quads_gen(a_long): while a_long: yield a_long & Q_bitmask a_long >>= Q_size def pack_long_big_endian(a_long, pad = 1): if lnum < 0: raise RangeError("Cannot use packl to convert a negative integer " "to a string.") qs = list(reversed(quads_gen(a_long))) # Pack the first one separately so we can lstrip nicely. first = _pack('>Q', qs[0]).lstrip('\x00') rest = _pack('>%sQ' % len(qs) - 1, *qs[1:]) count = len(first) + len(rest) # A little math trick that depends on Python behaviour of modulus # for negative numbers - but it well-defined and documented return '\x00' * (-count % pad) + first + rest

+3

Karl Knechtel Dec 05 '10 at 10:42

source share

Just wanted to post the answer to Sven's question (which works great). The opposite operation - moving from an arbitrarily long byte object to a Python Integer object requires the following (because there is no PyLong_FromByteArray () C API function that I can find):

 import binascii def unpack_bytes(stringbytes): #binascii.hexlify will be obsolete in python3 soon #They will add a .tohex() method to bytes class #Issue 3532 bugs.python.org return int(binascii.hexlify(stringbytes), 16)

+3

bk0 Feb 11 '11 at 15:55

source share

Sven marnach · Accepted Answer · 2011-06-21T23:15:03+0000

For completeness and future readers of this question:

Starting with Python 3.2, there are int.from_bytes() and int.to_bytes() functions that perform conversion between bytes and int objects in a selection of byte orders.

Is there a faster way to convert an arbitrary large integer to a large byte sequence of bytes?

More articles: