Python: regex findall

Iam using python regex to extract specific values ​​from a given string. This is my line:

mystring.txt

sometext somemore text here some other text course: course1 Id Name marks ____________________________________________________ 1 student1 65 2 student2 75 3 MyName 69 4 student4 43 course: course2 Id Name marks ____________________________________________________ 1 student1 84 2 student2 73 8 student7 99 4 student4 32 course: course4 Id Name marks ____________________________________________________ 1 student1 97 3 MyName 60 8 student6 82 

and I need to extract the course name and corresponding labels for a particular student. For example, I need a course and labels for MyName from the line above.

I tried:

 re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL) 

But this only works if MyName is present in each course, but not if MyName is absent in some courses, as in my example line.

Here I get the output as: [('course1', '69'), ('course2', '60')]

but actually I want to achieve: [('course1', '69'), ('course4', '60')]

What is the correct regular expression for this?

 #!/usr/bin/python import re buffer_fp = open("mystring.txt","r+") buff = buffer_fp.read() buffer_fp.close() print re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL) 
+5
source share
2 answers
 .*?course: (\w+)(?:(?!\bcourse\b).)*MyName\s+(\d+).*? ^^^^^^^^^^^^ 

You can try this. Check out the demo. Just use a search-based quantifier that will look for MyName before the course immediately before it.

https://regex101.com/r/pG1kU1/26

+5
source

I suspect that this cannot be done in one regex. They are not omnipotent.

Even if you find a way, do not do it. Your idle regex is already close to unreadable; most likely, a working solution will work. You will most likely do this in just a few lines of meaningful code. Pseudo-code solution:

 for line in buff: if it is a course line: set the course variable if it is a MyName line: add (course, marks) to the list of matches 

Note that this can (and probably should) include regular expressions in each of the if blocks. This is not the case between the hammer and the screwdriver, with the exception of the other, but rather the use of them as what they do best.

+2
source

All Articles