Python is assumed to be strongly typed.
For example: 'abc'['1'] will not work because you must specify an integer, not a string. The error will be raised and you can continue and correct it.
But this is not the case with a hashlib. In fact, try the following:
import hashlib hashlib.md5('abc')
Of course, it does not fail due to TypeError , but because of UnicodeEncodeError . UnicodeEncodeError should be raised when you try to encode unicode into a string.
I think I'm not too far from the truth when I assume that Khashlib silently tried to convert Unicode to a string.
Now. I agree, hashlib indicated that the hashlib.md5() argument should be a read-only string or buffer, which is a unicode string. But in fact, this suggests that this is actually not the case: hashlib.md5() will work correctly with strings, and what about it.
Of course, the main problem is that you will get an exception with some unicode strings, and not some others.
Which leads me to my questions. First, do you have an explanation why hashlib implements this behavior? Secondly, is this considered a problem? Thirdly, is there a way to fix this without changing the module itself?
Hashlib is basically an example, there are several other modules that behave the same when using unicode strings - which leads to an uncomfortable situation where your program will work with ASCII input but will fail completely with accents.