I tried using pyparsing to parse robotframework, which is a text DSL. Sitnax is similar to the following (sorry, but it seems to me that it's hard for me to describe it in BNF). one line in robotframework might look like this:
Library\tSSHClient with name\tnode
\ t is the tab, and in robotframework it is transparently passed to 2 "" (in fact, it just calls str.replace ('\ t', '') to replace the tab, but it will actually change the length of each line, len (' \ t ') is 1, but len (' ') is 2.). In the robot, 2 or more spaces and '\ t' are used to separate the token, if there is only 1 space between words, then the words are considered a marker group.
Library\tSSHClient with name\tnode
actually splits into the following markers if they are correctly parsed:
['Library', 'SSHClient', 'with name', 'node']
Since there is only 1 space between “c” and “name”, the parser believes that it belongs to the group syntax token.
Here is my code:
ParserElement.setDefaultWhitespaceChars('\r\n\t ')
source = "Library\tSSHClient with name\tnode"
EACH_LINE = Optional(Word(" ")).leaveWhitespace().suppress() + \
CaselessKeyword("library").suppress() + \
OneOrMore((Word(alphas)) + White(max=1).setResultName('myValue')) +\
SkipTo(LineEnd())
res = EACH_LINE.parseString(source)
print res.myValue
Questions:
1) I already installed WhiteSpaces, if I want to exactly match 2 or more spaces OR one or more Tabs, I thought the code would like: White (ws = '', min = 2) | White (ws = '\ t', min = 1) but this will not work, so I can not specify a space value?
2) Is there a way to get a consistent result index? I tried setParseAction, but it seems I could not get the index on this callback. I need both the beginning and the end of the index to highlight the word.
3) What does LineStart and LineEnd mean? I print these values, it seems that they are just a normal line, I need to write something before the line, for example: LineStart () + balabala ... + LineEnd ()?
, , , '\ t' ''
from pyparsing import *
source = "Library\tsshclient\t\t\twith name s1"
value = Combine(OneOrMore(Word(printables) | White(' ', max=1) + ~White()))
linedefn = OneOrMore(value)
res = linedefn.parseString(source)
print res
['Library sshclient', 'with name', 's1']
['Library', 'sshclient', ' ', 's1']