How do I know if Python is compiled with UCS-2 or UCS-4?

What the title says.

$ ./configure --help | grep -i ucs --enable-unicode[=ucs[24]] 

Searching for official documentation, I found this:

sys.maxunicode : an integer giving the largest supported code point for Unicode. Value this depends on the configuration option which determines whether Unicode characters are saved as UCS-2 or UCS-4.

What is unclear here is what value (s) corresponds to UCS-2 and UCS-4.

The code is expected to run in Python 2.6+.

+55
python unicode ucs2
Sep 18 '09 at 19:06
source share
7 answers

When built with --enable-unicode = ucs4:

 >>> import sys >>> print sys.maxunicode 1114111 

When built with --enable-unicode = ucs2:

 >>> import sys >>> print sys.maxunicode 65535 
+98
Sep 18 '09 at 19:33
source share

These are 0xFFFF (or 65535) for UCS-2 and 0x10FFFF (or 1114111) for UCS-4:

 Py_UNICODE PyUnicode_GetMax(void) { #ifdef Py_UNICODE_WIDE return 0x10FFFF; #else /* This is actually an illegal character, so it should not be passed to unichr. */ return 0xFFFF; #endif } 

The maximum character in UCS-4 mode is determined by the maximum value represented in UTF-16.

+17
Sep 18 '09 at 19:20
source share

I had the same problem. I registered it for myself in my wiki on

http://arcoleo.org/dsawiki/Wiki.jsp?page=Python%20UTF%20-%20UCS2%20or%20UCS4

I wrote -

 import sys sys.maxunicode > 65536 and 'UCS4' or 'UCS2' 
+10
Sep 20 '09 at 2:50
source share

sysconfig will report the unicode size of the python configuration variables.

Build flags can be requested as follows.

Python 2.7:

 import sysconfig sysconfig.get_config_var('Py_UNICODE_SIZE') 

Python 2.6:

 import distutils distutils.sysconfig.get_config_var('Py_UNICODE_SIZE') 
+4
Mar 04 '16 at 16:40
source share

Another way is to create a Unicode array and look at itemsize:

 import array bytes_per_char = array.array('u').itemsize 

Quote from array docs :

Code type 'u' matches the unicode character Pythons. On narrow Unicode assemblies, this is 2 bytes, on large strings it is 4 bytes.

Note that the distinction between narrow and wide Unicode strings is reset further with Python 3.3, see PEP393 . 'u' typecode for array deprecated since 3.3 and is scheduled for removal in Python 4.0.

+1
Sep 07 '16 at 11:28
source share

I had the same problem and found a semi-official piece of code that does just that and might be interesting for people with the same problem: https://bitbucket.org/pypa/wheel/src/cf4e2d98ecb1f168c50a6de496959b4a10c6b122/wheel/pep425tags.? at = default & fileviewer = file-view-default # pep425tags.py-83: 89 .

It comes from the wheel project, which should check if python is compiled using ucs-2 or ucs-4, because it will change the name of the generated binary.

0
Aug 17 '16 at 7:28
source share



All Articles