Regex: how to combine a sequence of key-value pairs at the end of a line

I am trying to match key-value pairs that appear at the end of (long) lines. The lines look like (I replaced "\ n")

my_str = "lots of blah
          key1: val1-words
          key2: val2-words
          key3: val3-words"

so I expect a match of "key1: val1-words", "key2: val2-words" and "key3: val3-words".

  • A set of possible key names is known.
  • Not every possible key appears on each line.
  • Each line displays at least two keys (if this makes matching easier).
  • val words can contain multiple words.
  • key-value pairs should only be matched at the end of the line.
  • I am using the Python re module.

I thought

re.compile('(?:tag1|tag2|tag3):')

plus some forward-looking approval material would be a solution. Although I can’t get it right. How do i do

Thank.

/ David

A line of a real example:

my_str = u'ucourt métrage pour kino session volume 18\nThème: O sombres héros\nContraintes: sous titrés\nAuthor: nicoalabdou\nTags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise\nPosted: 06 June 2009\nRating: 1.3\nVotes: 3'

EDIT:

Mikel, :


my_tags = ['\S+'] # gets all tags
my_tags = ['Tags','Author','Posted'] # selected tags
regex = re.compile(r'''
    \n                     # all key-value pairs are on separate lines
    (                      # start group to return
       (?:{0}):            # placeholder for tags to detect '\S+' == all
        \s                 # the space between ':' and value
       .*                  # the value
    )                      # end group to return
    '''.format('|'.join(my_tags)), re.VERBOSE)

regex.sub('',my_str) # return my_str without matching key-vaue lines regex.findall(my_str) # return matched key-value lines

+5
1

(?!pattern).

re module.

(?!...)

, ... . . , (?! ) "", "".

, , , - (?!\S+:)\S+.

:

regex = re.compile(r'''
    [\S]+:                # a key (any word followed by a colon)
    (?:
    \s                    # then a space in between
        (?!\S+:)\S+       # then a value (any word not followed by a colon)
    )+                    # match multiple values if present
    ''', re.VERBOSE)

matches = regex.findall(my_str)

['key1: val1-words ', 'key2: val2-words ', 'key3: val3-words']

/, :

for match in matches:
    print match

:

key1: val1-words
key2: val2-words
key3: val3-words

, :

Thème: O sombres héros 
Contraintes: sous titrés 
Author: nicoalabdou 
Tags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise 
Posted: 06 June 2009 
Rating: 1.3 
Votes: 3

/ , - :

pairs = dict([match.split(':', 1) for match in matches])

( ), .

:


+7

All Articles