Python 2.x: how to automate forced use of unicode instead of string?

How can I automate a test to ensure that the body of Python 2.x code does not contain string instances (Unicode instances only)?

Eg.

Can I do this from code?

Is there a static analysis tool that has this feature?

Edit:

I wanted this for an application in Python 2.5, but it turns out this is not real, because:

  • 2.5 does not support unicode_literals
  • The keys of the word kwargs cannot be unicode objects, only strings

So, I accept the answer, which says this is not possible, although this is for various reasons :)

+5
source share
3 answers

, Unicode; from __future__ import unicode_literals , b'...', Python 3.

, , unicode_literals : -U. 2.x, script.

? . "", Unicode ""; - , . , , .

Python 3, b'...' , , u'...' , Unicode. '...' , , , / Python 3 .

+1

, python . AST, , , - .

, Python . :

import parser
from token import tok_name

def checkForNonUnicode(codeString):
    return checkForNonUnicodeHelper(parser.suite(codeString).tolist())

def checkForNonUnicodeHelper(lst):
    returnValue = True
    nodeType = lst[0]
    if nodeType in tok_name and tok_name[nodeType] == 'STRING':
        stringValue = lst[1]
        if stringValue[0] != "u": # Kind of hacky. Does this always work?
            print "%s is not unicode!" % stringValue
            returnValue = False

    else:
        for subNode in [lst[n] for n in range(1, len(lst))]:
            if isinstance(subNode, list):
                returnValue = returnValue and checkForNonUnicodeHelper(subNode)

    return returnValue

print checkForNonUnicode("""
def foo():
    a = 'This should blow up!'
""")
print checkForNonUnicode("""
def bar():
    b = u'although this is ok.'
""")

'This should blow up!' is not unicode!
False
True

doc unicode, , - , from symbol import sym_name, node . -node , .. - . Unicode.

!

Edit

. , parser.suite Python. , Python, . , , myObscureUtilityFile.py,

from ..obscure.relative.path import whatever

checkForNonUnicode(open('/whoah/softlink/myObscureUtilityFile.py').read())
+1

SD (SCSE) .

SCSE . , , Python. , .

, langauge , langauge . SCSE , langauge. ( ) , regulatr. , on ( "S" " " ) ( Python, "UnicodeStrings", -Unicode .., Python, "S" ).

, :

 'for' ... I=ij*

"" ( "..." ) , "ij" . ( , , .

:

  S

. : -}

 UnicodeStrings

, Unicode (u "..." )

, UnicodeStrings. SCSE "", , . , , " ", :

  S-UnicodeStrings

, Unicode, .

SCSE provides logging tools so you can record hits. You can run SCSE from the command line by including a query script for the response. Putting this in a script command will provide a tool that will give you an answer directly.

0
source

All Articles