I am trying to match key-value pairs that appear at the end of (long) lines. The lines look like (I replaced "\ n")
my_str = "lots of blah
key1: val1-words
key2: val2-words
key3: val3-words"
so I expect a match of "key1: val1-words", "key2: val2-words" and "key3: val3-words".
- A set of possible key names is known.
- Not every possible key appears on each line.
- Each line displays at least two keys (if this makes matching easier).
- val words can contain multiple words.
- key-value pairs should only be matched at the end of the line.
- I am using the Python re module.
I thought
re.compile('(?:tag1|tag2|tag3):')
plus some forward-looking approval material would be a solution. Although I can’t get it right. How do i do
Thank.
/ David
A line of a real example:
my_str = u'ucourt métrage pour kino session volume 18\nThème: O sombres héros\nContraintes: sous titrés\nAuthor: nicoalabdou\nTags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise\nPosted: 06 June 2009\nRating: 1.3\nVotes: 3'
EDIT:
Mikel, :
my_tags = ['\S+'] # gets all tags
my_tags = ['Tags','Author','Posted'] # selected tags
regex = re.compile(r'''
\n # all key-value pairs are on separate lines
( # start group to return
(?:{0}): # placeholder for tags to detect '\S+' == all
\s # the space between ':' and value
.* # the value
) # end group to return
'''.format('|'.join(my_tags)), re.VERBOSE)
regex.sub('',my_str) # return my_str without matching key-vaue lines
regex.findall(my_str) # return matched key-value lines