With regex
import re ss = ''' >Entry1.1 #size=1688 704 1 1 1 4 979 2 2 2 0 1220 1 1 1 4 1309 1 1 1 4 1316 1 1 1 4 1372 1 1 1 4 1374 1 1 1 4 1576 1 1 1 4 >Entry2.1 #size=6251 6110 3 1.5 0 2 6129 2 2 2 2 6136 1 1 1 4 6142 3 3 3 2 6143 4 4 4 1 6150 1 1 1 4 6152 1 1 1 4 >Entry3.2 #size=1777 AND SO ON----------- ''' patbase = '(>Entry *%s(?![^\n]+?\d).+?)(?=>|(?:\s*\Z))' while True: x = raw_input('What entry do you want ? : ') found = re.findall(patbase % x, ss, re.DOTALL) if found: print 'found ==',found for each_entry in found: print '\n%s\n' % each_entry else: print '\n ** There is no such an entry **\n'
Explanation '(>Entry *%s(?![^\n]+?\d).+?)(?=>|(?:\s*\Z))' :
1)
%s gets a link to the entry: 1.1, 2, 2.1, etc.
2)
The (?![^\n]+?\d) should complete the check.
(?![^\n]+?\d) is a negative statement saying that after %s there should not be [^\n]+?\d , that is, any characters [^\n]+? before the digit \d
I write [^\n] to mean "any character except the new line \n ".
I have to write it instead of just .+? , because I put the re.DOTALL flag, and part of the template .+? will be valid until the end of the recording.
However, I only want to check that after the entered link (represented by% s in the template), additional digits entered with an error will not be added to the end of the line
All that is connected with the fact that if there is Entry2.1, but not Entry2, and the user enters only 2 because he wants Entry2 and none other, the regular expression will detect the presence of Entry2.1 and give it, although the user will really like Entry2 .
3)
At the end '(>Entry *%s(?![^\n]+?\d).+?) Part .+? will catch the full Entry block, because the dot represents any character consisting of a newline \n
For this purpose, I put the re.DOTALL flag to make the next part of the template .+? capable of transmitting newline characters to the end of a record.
4)
I want the match to stop at the end of the required record, not inside the next one, so the group defined in brackets in (>Entry *%s(?![^\n]+?\d).+?) Will catch exactly what we want | Therefore, I put an end to the positive statement (?=>|(?:\s*\Z)) , which says that the character before which is running ungreedy .+? must stop to match either > (start of next entry) or end of line \Z
Since it is possible that the end of the last record will not be exactly the end of the whole line, I put \s* , which means "possible spaces to the very end."
Thus, \s*\Z means that there may be spaces before striking the end of the line "Spaces are blank , \f , \n , \r , \t , \v