Django: Is Base64 an md5 hash with an email address of less than 30 characters?

I have been learning from a few hours the best way to use an email address instead of a username in Django authentication. This topic has been discussed many times, but these results are incompatible.

1) The answer here points to a snippet that distinguishes the username and just the email, having the "@" char in it. The maximum email length and username are not equal, although they are not counted in the response.

2) The second answer - from the same link - from S.Lott (13 votes) does black magic with admin.site. It doesn't make sense to me what the code does, is this an accepted way to make it short and sweet?

3) Then I found this solution that seems almost perfect (and makes sense to me):

username = uuid.uuid4().hex[:30] 

It selects only the first 30 characters of the unique identifier generated by Python as the username. But there is a chance of a collision. Then I came across a message in which someone claimed

The base md5 hash encoding has 25 characters

If this is true, was it possible to take the basic encoding of the md5 hash from the email address and guarantee 100% unique usernames that also have a value of less than 30 characters? If so, how can this be achieved?

Many thanks,

+4
source share
3 answers

You can do it as follows:

 >>> from hashlib import md5 >>> h = md5(' email@ example.com').digest().encode('base64')[:-1] >>> _ 'Vlj/zO5/Dr/aKyJiOLHrbg==' >>> len(h) 24 

You can ignore the last char because it is just a new line. The probability of a collision is the same as the MD5 hash, you do not lose information when encoding in base64.

 >>> original = md5(' email@example.com ').digest() >>> encoded = original.encode('base64') >>> original == encoded.decode('base64') True 
+4
source

MD5 hashes always have a length of 16 bytes, and Base64 encodes groups from 3 bytes to 4 characters; thus (16/3 rounded) => 6 groups of 3, times 4 = 24 characters for an MD5 hash encoded on Base64.

However, note that the link to the above page on Wikipedia reads:

However, since then it has been shown that MD5 is not collision resistant.

Therefore, you cannot count on this method by providing you with unique usernames from email addresses. Getting them is very easy with the hashlib module:

 >>> from hashlib import md5 >>> md5(' foo@bar.com ').digest().encode('base64').strip() '862kBc6JC2+CBAlN6xLYqA==' 
+2
source

UUID is 128 bits, so you can apply base64 to it directly to get a long string of 22 characters (by removing the fixed padding '==' , as suggested by Gumbo in the comments on the question)

 >>> import base64 >>> len(base64.urlsafe_b64encode(uuid.uuid4().bytes).rstrip('=')) 22 

Here urlsafe_b64encode and removing '=' are used to avoid characters that do not match the User.username field, including '/' '+' and '='

In addition, the UUID has two fixed bits of '10' (hence the 17th char in hexadecimal representation is always 8,9,A,B ) and four version bits, check the wiki .
So you can throw 4 + 2 = 6 bits into w / 2 effective bits to get a long hexadecimal string 30 characters long:

 >>> s = uuid.uuid4().hex >>> len(s[:12] + s[13:16] + s[17:]) 30 

That way, you only remove 2 effective bits instead of 8, when you just slice s into s[:30] and you can expect better uniqueness (no more than 1/4 of the coding of the uuid space).

+2
source

All Articles