Update
Ironically, a few weeks after I posted the previous answer, two Chinese researchers, Tao Xie and Dengguo Feng, published a new one-block clash for MD5 . Until now, I did not know about this paper. One MD5 block means that the input size is 64 bytes or 512 bits. Note that the inputs are basically the same, only 2 bits different .
Their methodology will not be published until January 2013, but their collision can be checked now using the numbers from the article:
>>> from array import array >>> from hashlib import md5 >>> input1 = array('I', [0x6165300e,0x87a79a55,0xf7c60bd0,0x34febd0b,0x6503cf04, 0x854f709e,0xfb0fc034,0x874c9c65,0x2f94cc40,0x15a12deb,0x5c15f4a3,0x490786bb, 0x6d658673,0xa4341f7d,0x8fd75920,0xefd18d5a]) >>> input2 = array('I', [x^y for x,y in zip(input1, [0, 0, 0, 0, 0, 1<<10, 0, 0, 0, 0, 1<<31, 0, 0, 0, 0, 0])]) >>> input1 == input2 False >>> md5(input1).hexdigest() 'cee9a457e790cf20d4bdaa6d69f01e41' >>> md5(input2).hexdigest() 'cee9a457e790cf20d4bdaa6d69f01e41'
Update: An article was published in March 2013: Tao Xie and Fanbao Liu and Dengo Feng - Rapid Collision Attack with MD5
However, if you have more opportunities for the game, collisions of several kilobytes are calculated much faster - they can be calculated within an hour on ANY regular computer.
Old answer
In the previous shortest collision, at least two MD5 blocks were used, which are worth the input - these are 128 bytes, 1024 bits. The prefix in the first block can be arbitrarily selected by the attacker, the rest will be calculated and displayed as gibberish.
Here is an example of two different counter inputs, you can try it yourself in Python:
>>> from binascii import unhexlify >>> from hashlib import md5 >>> input1 = 'Oded Goldreich\nOded Goldreich\nOded Goldreich\nOded Go' + unhexlify( ... 'd8050d0019bb9318924caa96dce35cb835b349e144e98c50c22cf461244a4064bf1afaecc582' ... '0d428ad38d6bec89a5ad51e29063dd79b16cf67c12978647f5af123de3acf844085cd025b956') >>> len(input1) 128 >>> md5(input1).hexdigest() 'd320b6433d8ebc1ac65711705721c2e1' >>> input2 = 'Neal Koblitz\nNeal Koblitz\nNeal Koblitz\nNeal Koblitz\n' + unhexlify( ... '75b80e0035f3d2c909af1baddce35cb835b349e144e88c50c22cf461244a40e4bf1afaecc582' ... '0d428ad38d6bec89a5ad51e29063dd79b16cf6fc11978647f5af123de3acf84408dcd025b956') >>> md5(input2).hexdigest() 'd320b6433d8ebc1ac65711705721c2e1'
Generating these two specific inputs took 2 days in a Playstation 3 215-node cluster, Mark Stevens :)