I have a large project where problematic implicit Unicode conversions (coversions) were used in different places in the form, for example:
someDynamicStr = "bar"
(Perhaps other forms).
Now I would like to track these usages, especially those that are used in heavily used code.
It would be great if I could easily replace the unicode constructor with a wrapper that checks if the input is of type str and the encoding / errors parameters are set by default and then notifies me (trace fingerprint or such).
/editing:
Not directly related to what I'm looking for, I came across this glorious horrible hack to exclude a decoding exception (only decoding, i.e. str to unicode , but not vice versa) around, see https: //mail.python .org / pipermail / python-list / 2012-July / 627506.html ).
I do not plan to use it, but it may be interesting for those who are struggling with invalid Unicode inputs and looking for a quick fix (but please think about side effects):
import codecs codecs.register_error("strict", codecs.ignore_errors) codecs.register_error("strict", lambda x: (u"", x.end))
(An Internet search of codecs.register_error("strict" showed that it was apparently used in some real projects.)
/ edit # 2:
For explicit conversions, I made a snippet using the SO message after monkeypatching :
class PatchedUnicode(unicode): def __init__(self, obj=None, encoding=None, *args, **kwargs): if encoding in (None, "ascii", "646", "us-ascii"): print("Problematic unicode() usage detected!") super(PatchedUnicode, self).__init__(obj, encoding, *args, **kwargs) import __builtin__ __builtin__.unicode = PatchedUnicode
This only affects explicit conversions using the unicode() constructor, so this is not what I need.
/ edit # 3:
The thread “ Extension Method for python built-in types! ” Makes me think it might not be easy (at least in CPython).
/ edit # 4:
It's nice to see a lot of good answers here, too bad, I can only give out a reward once.
At the same time, I came across a somewhat similar question, at least in the sense of what the person was trying to achieve: Can I turn off implicit Unicode conversions to Python, error strings? Please note that if an exception was thrown, it was not in order. Here I was looking for something that could point me to different places in the problem code (for example, by printing something), but not something that could exit the program or change its behavior (because in this way I can determine priorities for correction).
On the other hand, people working on the Mypy project (including Guido van Rossum) may also come up with something similar useful in the future, see discussions at https://github.com/python/mypy/issues/1141 and more recently https://github.com/python/typing/issues/208 .
/ edit # 5
I also stumbled upon the following, but have not had time to check it out yet: https://pypi.python.org/pypi/unicode-nazi