Separate tags in python

I have a file that contains this:

<html> <head> <title> Hello! - {{ today }}</title> </head> <body> {{ runner_up }} avasd {{ blabla }} sdvas {{ oooo }} </body> </html> 

What is the best or most Pythonic way to extract {{today}} , {{runner_up}} , etc.?

I know this can be done with splits / regular expressions, but I wondered if there is another way.

PS: consider the data loaded into a variable called thedata .

Edit: I think the HTML example was bad because it sent some commentators to BeautifulSoup. So here is the new input:

 Fix grammatical or {{spelling}} errors. Clarify meaning without changing it. Correct minor {{mistakes}}. Add related resources or links. Always respect the original {{author}}. 

Output:

 spelling mistakes author 
+4
source share
5 answers

Mmkay, well here is a generator solution that seems to work well for me. You can also provide various public and private tags if you wish.

 def get_tags(s, open_delim ='{{', close_delim ='}}' ): while True: # Search for the next two delimiters in the source text start = s.find(open_delim) end = s.find(close_delim) # We found a non-empty match if -1 < start < end: # Skip the length of the open delimiter start += len(open_delim) # Spit out the tag yield s[start:end].strip() # Truncate string to start from last match s = s[end+len(close_delim):] else: return 

Run against your target input as follows:

 # prints: today, runner_up, blabla, oooo for tag in get_tags(html): print tag 

Edit: it also works against your new example :). In my explicit quick test, it also seemed to handle the wrong tags in a reasonable way, although I cannot guarantee its reliability!

+8
source

try templatemaker , the creator of the inverse template. he can actually learn them automatically from examples!

+3
source

I know that you did not say regex / split, but I could not help but try a single-line solution:

 import re for s in re.findall("\{\{.*\}\}",thedata): print s.replace("{","").replace("}","") 

EDIT: JFS

For comparison:

 >>> re.findall('\{\{.*\}\}', '{{a}}b{{c}}') ['{{a}}b{{c}}'] >>> re.findall('{{(.+?)}}', '{{a}}b{{c}}') ['a', 'c'] 
+2
source

If the data is simple, a simple regex will do the trick.

+1
source

JF Sebastian wrote this in a comment, but I thought he was good enough to deserve his own answer:

 re.findall(r'{{(.+?)}}', thestring) 

I know that the OP asked for a method that did not include splits or regular expressions - so perhaps this will not quite answer the question as indicated. But this one line of code definitely gets my vote as the most Pythonic way to complete the task.

+1
source

All Articles