It is a bit complicated with the need for quotes on each line and the validity of empty lines. Here is the regular expression corresponding to the file that you placed correctly:
'(""\n)*"This(( "\n(""\n)*")|("\n(""\n)*" )| )is(( "\n(""\n)*")|("\n(""\n)*" )| )an(( "\n(""\n)*")|("\n(""\n)*" )| )example(( "\n(""\n)*")|("\n(""\n)*" )| )string"'
This is a bit confusing, but all there is is the line you want to match, but it starts with:
(""\n)*"
and replaces the spaces between each word:
(( "\n(""\n)*")|("\n(""\n)*" )| )
which checks three different possibilities after each word: "space", "quote", "new line" (unlimited number of empty lines) "quote" or the same sequence, but more space to the end or just a space.
An easier way to get this working is to write a small function that will be used in the line that you are trying to match, and return a regular expression that will match it:
def getregex(string): return '(""\n)*"' + string.replace(" ", '(( "\n(""\n)*")|("\n(""\n)*" )| )') + '"'
So, if you have a file that you sent in a line called "filestring", you will get matches like this:
import re def getregex(string): return '(""\n)*"' + string.replace(" ", '(( "\n(""\n)*")|("\n(""\n)*" )| )') + '"' matcher = re.compile(getregex("This is an example string")) for i in matcher.finditer(filestring): print i.group(0), "\n" >>> "This is " "an example string" "This is an example string" "" "This is an " "example" " string"
This regular expression does not take into account the space that you have after the "example" in the third part, but I assume that this is generated by the machine and that there is an error.