Spiral Pixel Problems

I tried using pyparsing to parse robotframework, which is a text DSL. Sitnax is similar to the following (sorry, but it seems to me that it's hard for me to describe it in BNF). one line in robotframework might look like this:

Library\tSSHClient    with name\tnode

\ t is the tab, and in robotframework it is transparently passed to 2 "" (in fact, it just calls str.replace ('\ t', '') to replace the tab, but it will actually change the length of each line, len (' \ t ') is 1, but len ​​(' ') is 2.). In the robot, 2 or more spaces and '\ t' are used to separate the token, if there is only 1 space between words, then the words are considered a marker group.

Library\tSSHClient    with name\tnode

actually splits into the following markers if they are correctly parsed:

 ['Library', 'SSHClient', 'with name', 'node']

Since there is only 1 space between “c” and “name”, the parser believes that it belongs to the group syntax token.

Here is my code:

ParserElement.setDefaultWhitespaceChars('\r\n\t ')
source = "Library\tSSHClient    with name\tnode"
EACH_LINE = Optional(Word(" ")).leaveWhitespace().suppress() + \
            CaselessKeyword("library").suppress() + \
            OneOrMore((Word(alphas)) + White(max=1).setResultName('myValue')) +\
            SkipTo(LineEnd())

res = EACH_LINE.parseString(source)
print res.myValue

Questions:

1) I already installed WhiteSpaces, if I want to exactly match 2 or more spaces OR one or more Tabs, I thought the code would like: White (ws = '', min = 2) | White (ws = '\ t', min = 1) but this will not work, so I can not specify a space value?

2) Is there a way to get a consistent result index? I tried setParseAction, but it seems I could not get the index on this callback. I need both the beginning and the end of the index to highlight the word.

3) What does LineStart and LineEnd mean? I print these values, it seems that they are just a normal line, I need to write something before the line, for example: LineStart () + balabala ... + LineEnd ()?

, , , '\ t' ''

from pyparsing import *

source = "Library\tsshclient\t\t\twith name    s1"

value = Combine(OneOrMore(Word(printables) | White(' ', max=1) + ~White()))  #here it seems the whitespace has already been set to ' ', why the result still match '\t'?

linedefn = OneOrMore(value)

res = linedefn.parseString(source)

print res

['Library sshclient', 'with name', 's1']

   ['Library', 'sshclient', ' ', 's1']

+4
1

, , , . , :

# each value consists of printable words separated by at most a 
# single space (a space that is not followed by another space)
value = Combine(OneOrMore(Word(printables) | White(' ',max=1) + ~White()))

:

linedefn = OneOrMore(value)

, str.replace , :

data = "Library\tSSHClient    with name\tnode"

# replace tabs with 2 spaces
data = data.replace('\t', '  ')

print linedefn.parseString(data)

:

['Library', 'SSHClient', 'with name', 'node']

, pyparsing locatedExpr:

# use new locatedExpr to get the value, start, and end location 
# for each value
linedefn = OneOrMore(locatedExpr(value))('values')

:

print linedefn.parseString(data).dump()

:

- values: 
  [0]:
    [0, 'Library', 7]
    - locn_end: 7
    - locn_start: 0
    - value: Library
  [1]:
    [9, 'SSHClient', 18]
    - locn_end: 18
    - locn_start: 9
    - value: SSHClient
  [2]:
    [22, 'with name', 31]
    - locn_end: 31
    - locn_start: 22
    - value: with name
  [3]:
    [33, 'node', 37]
    - locn_end: 37
    - locn_start: 33
    - value: node

LineStart LineEnd pyparsing expression, . LineStart , LineEnd . , , - , . , ( - ), + LineEnd() + StringEnd() parseAll=True parseString().

EDIT:

, pyparsing str.expandtabs - , parseWithTabs. TAB . . :

from pyparsing import *
TAB = White('\t')

# each value consists of printable words separated by at most a 
# single space (a space that is not followed by another space)
value = Combine(OneOrMore(~TAB + (Word(printables) | White(' ',max=1) + ~White())))

# each line has one or more of these values
linedefn = OneOrMore(value)
# do not expand tabs before parsing
linedefn.parseWithTabs()


data = "Library\tSSHClient    with name\tnode"

# replace tabs with 2 spaces
#data = data.replace('\t', '  ')

print linedefn.parseString(data)


linedefn = OneOrMore(locatedExpr(value))('values')
# do not expand tabs before parsing
linedefn.parseWithTabs()
print linedefn.parseString(data).dump()
+5

All Articles