Optional string segment in pyparsing

I am working with pyparsing and trying to define two elements as follows:

identifier = Word(alphas, alphanums).setName('identifier) database_name = Optional(identifier.setResultsName('user') + Suppress('.')) + identifier.setResultsName('database') table_name = database_name + Suppress('.') + identifier.setResultsName('table') 

The idea is that matching table_name , it will take a row with two or three segments and lead to the following:

 mark.foo.bar => tokens.user = 'mark' tokens.database = 'foo' tokens.table = 'bar' 

Or, if the first segment is missing:

 foo.bar => tokens.user = '' #anything is acceptable: none, empty string or just plain missing tokens.database = 'foo' tokens.table = 'bar' 

table_name must always have two segments and one point or three segments (two points), as indicated above. One segment is unacceptable.

database_name must have either one segment (database) or two (user.database).

Instances of using database_name work fine - it will correspond to one or two segments. However, in some cases, table_name does not work:

 # Works for three segments mark.foo.bar => tokens.user = 'mark' tokens.database = 'foo' tokens.table = 'bar' # Fails for two foo.bar => Expected "." (at char 7), (line:1m col:8) 

I see what it does: foo.bar been mapped to user.database and now expects a third piece representing the name of the table. However, this is not what I want.

reference

+4
source share
1 answer

The problem is that when you map the main identifier, you don’t know enough to say whether this user field will be or not, until you look at all the possible fields of the table. Unfortunately, this means that you cannot determine the name of the database with its main optional field "user" yourself, you must define a comprehensive expression table_name that has two or three fields.

The following code shows 3 options for resolving the ambiguity of a leading optional identifier:

  • first try combining the full form with three fields, and if that fails, try matching the form with two fields

  • explicitly search when matching an optional leading "user" field using FollowedBy to match the "user" only if followed by 2*(DOT+identifier)

  • match all lists with markup for points of any length and use the parsing action to verify that only 2 or 3 identifiers are transmitted, and assign the names of the results

See comments to see how each option is implemented. (Note that to simplify the code, I have also replaced using the full expr.setResultsName('something') only expr('something') , which I think is generally easier to read.)

 from pyparsing import * identifier = Word(alphas, alphanums).setName('identifier') DOT = Suppress('.') # Option 1 - fully specified options full_database_name = identifier('user') + DOT + identifier('database') just_database_name = identifier('database') table_name = (full_database_name + DOT + identifier('table') | just_database_name + DOT + identifier('table')) # Option 2 - use FollowedBy to explicitly lookahead when checking for leading user table_name = (Optional(identifier('user') + FollowedBy(2*(DOT+identifier)) + DOT) + identifier('database') + DOT + identifier('table')) # Option 3 - use liberally matching expression, with a parse action to assign fields def assignTableFields(fields): if len(fields) == 2: fields['database'],fields['table'] = fields elif len(fields) == 3: fields['user'],fields['database'],fields['table'] = fields else: raise ParseException("wrong number of fields") table_name = delimitedList(identifier, delim='.').setParseAction(assignTableFields) for test in ("abc", "bc"): print test print table_name.parseString(test).dump() print 

You may also find this too liberal a match, since it also allows you to interleave spaces, so that "a . b" will also qualify as a valid table name. You can define another parsing action and add it to table_name as:

 def noWhitespace(source, locn, tokens): if not source[locn:].startswith('.'.join(tokens)): raise ParseException("found whitespace between fields") table_name.addParseAction(noWhitespace) 

See that for this syntax action I called addParseAction instead of setParseAction , so any existing parsing actions will be saved (in case of option 3), and this new one is added to the parsing action chain to run.

+5
source

All Articles