Hashing the same character multiple times

Question

Hashing the same character multiple times

I do programming, and I'm going crazy with one of the problems. In this case, I need to calculate the MD5 line. The line is as follows:

n[c] : Where n is a number and c is a character. For example: b3[a2[c]] => baccaccacc

Everything went fine until they gave me the following line:

1[2[3[4[5[6[7[8[9[10[11[12[13[a]]]]]]]]]]]]]

These lines turn into a line with 6227020800 a . This line is more than 6 GB, so it is almost impossible to calculate it in practical time. So here is my question:

Are there any MD5 properties that I can use here?

I know that there must be a form in order to do this in a short time, and I suspect that this is due to the fact that the whole line has the same character repeated several times.

+7

python string algorithm hash md5

Lars May 06 '13 at 13:01

source share

3 answers

All hash functions are designed to work with byte streams, so you should not first generate a whole string, and after this hash you should write a generator that creates fragments of string data and passes it to the MD5 context. And MD5 uses a 64 byte (or char) buffer, so it would be nice to pass 64 byte chunks of data to the context.

0

Nickolay Olshevsky May 6, '13 at 14:15

source share

Take advantage of the good properties of hashes:

 import hashlib cruncher = hashlib.md5() chunk = 'a' * 100 for i in xrange(100000): cruncher.update(chunk) print cruncher.hexdigest()

Change the number of rounds (x = 10000) and the length of the piece (y = 100) so that x * y = 13 !. The fact is that you feed the algorithm with pieces of your string (each one character x long), one after the other, for y times.

0

Stefano sanfilippo May 6, '13 at 14:25

source share

Alfe · Accepted Answer · 2013-05-06T13:49:54+0000

You probably created a (recursive) function to get the result as a single value. Instead, you should use a generator to get the result as a stream of bytes. Then you can load byte by byte into your MD5 hash procedure. The size of the stream does not matter in this way, it will simply affect the calculation time, and not the memory used.

This example uses a single pass parser:

 import re, sys, md5 def p(s, pos, callBack): while pos < len(s): m = re.match(r'(d+)[', s[pos:]) if m: # repetition? number = m.group(1) for i in range(int(number)): endPos = p(s, pos+len(number)+1, callBack) pos = endPos elif s[pos] == ']': return pos + 1 else: callBack(s[pos]) pos += 1 return pos + 1 digest = md5.new() def feed(s): digest.update(s) sys.stdout.write(s) sys.stdout.flush() end = p(sys.argv[1], 0, feed) print print "MD5:", digest.hexdigest() print "finished parsing input at pos", end

Hashing the same character multiple times

More articles: