Ipython and python treat my string differently, why?

In python (2.7.1):

>>> x = u'$โ‚ฌ%' >>> x.find('%') 2 >>> len(x) 3 

While in ipython:

 >>> x = u'$โ‚ฌ%' >>> x.find('%') 4 >>> len(x) 5 

What's going on here?


edit: including the additional information requested from the comments below

Ipython

 >>> import sys, locale >>> reload(sys) <module 'sys' (built-in)> >>> sys.setdefaultencoding(locale.getdefaultlocale()[1]) >>> sys.getdefaultencoding() 'UTF8' >>> x = u'$โ‚ฌ%' >>> x u'$\xe2\x82\xac%' >>> print x $รขยฌ% >>> len(x) 5 

Python

 >>> import sys, locale >>> reload(sys) <module 'sys' (built-in)> >>> sys.setdefaultencoding(locale.getdefaultlocale()[1]) >>> sys.getdefaultencoding() 'UTF8' >>> x = u'$โ‚ฌ%' >>> x u'$\u20ac%' >>> print x $โ‚ฌ% >>> len(x) 3 
+4
source share
2 answers

@ nye17 Officially it is not recommended to ever call setdefaultencoding() (it is removed from sys after the first use for a reason). One of the main culprits is gtk, which causes all kinds of problems, so if IPython imported gtk, sys.getdefaultencoding() will return utf8. IPython does not set the default encoding.

@wim may I ask which version of IPython you are using? The overhaul part in 0.11 was fixed by many unicode errors, but more is happening (mostly on Windows now).

I ran a test script in IPython 0.11, and the behavior of IPython and Python seems the same, so I think this bug is fixed.

Relevant Values:

  • sys.stdin.encoding = utf8
  • sys.getdefaultencoding () = ascii
  • tested platforms: Ubuntu 10.04 + Python2.6.5, OSX 10.7 + Python2.7.1

As for the explanation, in fact, IPython did not recognize that the input could be unicode. In IPython 0.10, utf8 multibyte input is not respected, so each byte = 1 character, which you can see with

 In [1]: x = '$โ‚ฌ%' In [2]: x Out[2]: '$\xe2\x82\xac%' In [3]: y = u'$โ‚ฌ%' In [4]: y Out[4]: u'$\xe2\x82\xac%'# wrong! 

While what should happen and what happens in 0.11 is that y == x.decode(sys.stdin.encoding) , not repr(y) == 'u'+repr(x)

+5
source

if you do

 import sys sys.getdefaultencoding() 

I think you will get different results in python on ipython, maybe one is ascii and the other is utf-8 , so this should only be a question of which encoding is selected by default by default.

Another test you can do is enter the following to specify it as the default locale,

 import sys, locale reload(sys) sys.setdefaultencoding(locale.getdefaultlocale()[1]) sys.getdefaultencoding() 

then try test x in your question.

+1
source

All Articles