Matching a string arbitrarily split into multiple lines

Is there a way in a regular expression to combine a string arbitrarily divided into several lines - let's say that we have the following format in the file:

msgid "This is " "an example string" msgstr "..." msgid "This is an example string" msgstr "..." msgid "" "This is an " "example" " string" msgstr "..." msgid "This is " "an unmatching string" msgstr "..." 

So, we would like to have a template that will correspond to all examples of strings, that is: to correspond to a string regardless of how it is divided into lines. Please note that we execute a specific line, as shown in the example, and not just any line. Therefore, in this case, we would like to match the string "This is an example string" .

Of course, we can easily concatenate strings and then apply the match, but I wondered if this was possible. I say Python , but the general answer is fine.

+4
source share
2 answers

Do you want to combine a series of words? If so, you can search for words with spaces (\ s) between them, since \ s matches newlines and spaces.

 import re search_for = "This is an example string" search_for_re = r"\b" + r"\s+".join(search_for.split()) + r"\b" pattern = re.compile(search_for_re) match = lambda s: pattern.match(s) is not None s = "This is an example string" print match(s), ":", repr(s) s = "This is an \n example string" print match(s), ":", repr(s) s = "This is \n an unmatching string" print match(s), ":", repr(s) 

Print

 True : 'This is an example string' True : 'This is an \n example string' False : 'This is \n an unmatching string' 
+4
source

It is a bit complicated with the need for quotes on each line and the validity of empty lines. Here is the regular expression corresponding to the file that you placed correctly:

 '(""\n)*"This(( "\n(""\n)*")|("\n(""\n)*" )| )is(( "\n(""\n)*")|("\n(""\n)*" )| )an(( "\n(""\n)*")|("\n(""\n)*" )| )example(( "\n(""\n)*")|("\n(""\n)*" )| )string"' 

This is a bit confusing, but all there is is the line you want to match, but it starts with:

 (""\n)*" 

and replaces the spaces between each word:

 (( "\n(""\n)*")|("\n(""\n)*" )| ) 

which checks three different possibilities after each word: "space", "quote", "new line" (unlimited number of empty lines) "quote" or the same sequence, but more space to the end or just a space.

An easier way to get this working is to write a small function that will be used in the line that you are trying to match, and return a regular expression that will match it:

 def getregex(string): return '(""\n)*"' + string.replace(" ", '(( "\n(""\n)*")|("\n(""\n)*" )| )') + '"' 

So, if you have a file that you sent in a line called "filestring", you will get matches like this:

 import re def getregex(string): return '(""\n)*"' + string.replace(" ", '(( "\n(""\n)*")|("\n(""\n)*" )| )') + '"' matcher = re.compile(getregex("This is an example string")) for i in matcher.finditer(filestring): print i.group(0), "\n" >>> "This is " "an example string" "This is an example string" "" "This is an " "example" " string" 

This regular expression does not take into account the space that you have after the "example" in the third part, but I assume that this is generated by the machine and that there is an error.

0
source

Source: https://habr.com/ru/post/1410816/


All Articles