I am trying to parse a string using pyparsing. Using the code below
import pyparsing as pyp
aString = "C((H2)(C(H3))) C((H1)(Cl1)) C(((C(H3))3))"
aSub = '(('+ pyp.Word('()'+pyp.srange('[A-Za-z0-9]'))+'))'
substituent = aSub('sub')
for t,s,e in substituent.scanString(aString):
print t.sub
I have no conclusion. However, aString = "C((H2)(C(H3))) C((H1)(Cl1)) C(((C(H3))3))"there are several occurrences in the line ((stuff))- in particular ((H2)(C(H3))), C((H1)(Cl1))and C(((C(H3))3)).
My understanding Word()was that the input (in the case of a single input, like mine) represents all possible combinations of characters that successfully return a match.
Code run
import pyparsing as pyp
aString = "C((H2)(C(H3))) C((H1)(Cl1)) C(((C(H3))3))"
aSub = '(' + pyp.Word(pyp.srange('[A-Za-z0-9]'))+')'
substituent = aSub('sub')
for t,s,e in substituent.scanString(aString):
print t.sub
gives the result
['(', 'H2', ')']
['(', 'H3', ')']
['(', 'H1', ')']
['(', 'Cl1', ')']
['(', 'H3', ')']
All I changed is an extra external set of brackets, as well as parentheses inside the string that have the desired strings. I'm not sure why the first program does not give me anything, and the second line gives me (part) what I want.