Python: breaking a complex string, including parentheses and |

In the test file, I have entries in the form

DATA(VALUE1|VALUE2||VALUE4) 

etc.

I would like to split this line into two passes, the first gives way to "DATA", and the second gives me what is inside the parentheses, divided by "|". The second part seems trivial, but so far, my attempts have been ugly in the first place.

I am more likely to regex than parsing, because the strings are pretty simple in the end.

+4
source share
3 answers

You can do this in one go with re.split :

 In [10]: import re In [11]: line = 'DATA(VALUE1|VALUE2||VALUE4)' In [12]: re.split(r'[(|)]', line) Out[12]: ['DATA', 'VALUE1', 'VALUE2', '', 'VALUE4', ''] 

And extract the data and values โ€‹โ€‹as follows:

 In [13]: parts = re.split(r'[(|)]', line) In [14]: data = parts[0] In [15]: values = parts[1:-1] In [16]: values Out[16]: ['VALUE1', 'VALUE2', '', 'VALUE4'] 
+2
source

Another suggestion:

 >>> s = "DATA(VALUE1|VALUE2||VALUE4)" >>> import re >>> matches = re.findall("[^()]+", s) >>> matches ['DATA', 'VALUE1|VALUE2||VALUE4'] >>> result = {matches[0]: matches[1].split("|")} >>> result {'DATA': ['VALUE1', 'VALUE2', '', 'VALUE4']} 
+5
source
 import re s = 'DATA(VALUE1|VALUE2|VALUE4)' 

then

 re.search(r"(.*)\((.*)\)", s).group(2).split("|") 

gives you

 ['VALUE1', 'VALUE2', 'VALUE4'] 

and

 re.search(r"(.*)\((.*)\)", s).group(1) 

gives you

 'DATA' 
+1
source

All Articles