What the title says.
$ ./configure --help | grep -i ucs --enable-unicode[=ucs[24]]
Searching for official documentation, I found this:
sys.maxunicode : an integer giving the largest supported code point for Unicode. Value this depends on the configuration option which determines whether Unicode characters are saved as UCS-2 or UCS-4.
What is unclear here is what value (s) corresponds to UCS-2 and UCS-4.
The code is expected to run in Python 2.6+.
When built with --enable-unicode = ucs4:
>>> import sys >>> print sys.maxunicode 1114111
When built with --enable-unicode = ucs2:
>>> import sys >>> print sys.maxunicode 65535
These are 0xFFFF (or 65535) for UCS-2 and 0x10FFFF (or 1114111) for UCS-4:
Py_UNICODE PyUnicode_GetMax(void) { #ifdef Py_UNICODE_WIDE return 0x10FFFF; #else /* This is actually an illegal character, so it should not be passed to unichr. */ return 0xFFFF; #endif }
The maximum character in UCS-4 mode is determined by the maximum value represented in UTF-16.
I had the same problem. I registered it for myself in my wiki on
http://arcoleo.org/dsawiki/Wiki.jsp?page=Python%20UTF%20-%20UCS2%20or%20UCS4
I wrote -
import sys sys.maxunicode > 65536 and 'UCS4' or 'UCS2'
sysconfig will report the unicode size of the python configuration variables.
Build flags can be requested as follows.
Python 2.7:
import sysconfig sysconfig.get_config_var('Py_UNICODE_SIZE')
Python 2.6:
import distutils distutils.sysconfig.get_config_var('Py_UNICODE_SIZE')
Another way is to create a Unicode array and look at itemsize:
import array bytes_per_char = array.array('u').itemsize
Quote from array docs :
array
Code type 'u' matches the unicode character Pythons. On narrow Unicode assemblies, this is 2 bytes, on large strings it is 4 bytes.
'u'
Note that the distinction between narrow and wide Unicode strings is reset further with Python 3.3, see PEP393 . 'u' typecode for array deprecated since 3.3 and is scheduled for removal in Python 4.0.
65535 - UCS-2:
Thus, the code point U + 0000 is encoded as the number 0, and U + FFFF is encoded as 65535 (which is FFFF16 in hexadecimal format).
I had the same problem and found a semi-official piece of code that does just that and might be interesting for people with the same problem: https://bitbucket.org/pypa/wheel/src/cf4e2d98ecb1f168c50a6de496959b4a10c6b122/wheel/pep425tags.? at = default & fileviewer = file-view-default # pep425tags.py-83: 89 .
It comes from the wheel project, which should check if python is compiled using ucs-2 or ucs-4, because it will change the name of the generated binary.