Encoding a numeric string in an abbreviated alphanumeric string and vice versa

Quick question. I am trying to find or write an encoder in Python to shorten a string of numbers using upper and lower case letters. Numeric lines look something like this:

20120425161608678259146181504021022591461815040210220120425161608667 

The length is always the same.

My initial thought was to write some simple encoder to use upper and lower case letters and numbers to shorten this line into something more like this:

 a26Dkd38JK 

It was completely arbitrary, just trying to be as clear as possible. I am sure there is a really smooth way to do this, perhaps already built-in. Maybe this is a question that is awkward even to be asked.

In addition, I need to remove the shortened string and convert it to a longer numeric value. Do I have to write something and publish the code, or is it one line built into the Python function that I should already know about?

Thanks!

+7
source share
3 answers

This is a pretty good compression:

 import base64 def num_to_alpha(num): num = hex(num)[2:].rstrip("L") if len(num) % 2: num = "0" + num return base64.b64encode(num.decode('hex')) 

First, it turns an integer into a byte string, and then base64 encodes it. Here is the decoder:

 def alpha_to_num(alpha): num_bytes = base64.b64decode(alpha) return int(num_bytes.encode('hex'), 16) 

Example:

 >>> num_to_alpha(20120425161608678259146181504021022591461815040210220120425161608667) 'vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w==' >>> alpha_to_num('vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w==') 20120425161608678259146181504021022591461815040210220120425161608667 
+10
source

There are two functions that are customizable (not based on base64 ), but produce shorter output:

 chrs = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' l = len(chrs) def int_to_cust(i): result = '' while i: result = chrs[i % l] + result i = i // l if not result: result = chrs[0] return result def cust_to_int(s): result = 0 for char in s: result = result * l + chrs.find(char) return result 

And the results:

 >>> int_to_cust(20120425161608678259146181504021022591461815040210220120425161608667) '9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx' >>> cust_to_int('9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx') 20120425161608678259146181504021022591461815040210220120425161608667L 

You can also shorten the generated string if you add other characters to the chrs variable.

+6
source
 >>> s="20120425161608678259146181504021022591461815040210220120425161608667" >>> import base64, zlib >>> base64.b64encode(zlib.compress(s)) 'eJxly8ENACAMA7GVclGblv0X4434WrKFVW5CtJl1HyosrZKRf3hL5gLVZA2b' >>> zlib.decompress(base64.b64decode(_)) '20120425161608678259146181504021022591461815040210220120425161608667' 

therefore zlib is not real smart when compressing strings of numbers :(

0
source

All Articles