How to keep ReqEX output length stable in case of lack of data, python2.7

The following regExp will match 3 words before and after if they exist

((?:\w+\s+){0,3}My_WORD_HERE(?:\s+\w+){0,3})

The output will be like this:

word1 word2 word3 My_WORD_HERE word1 word2 word3

or

word1 word2 My_WORD_HERE word1, which leads to empty attributes.

How can we fill in the missing words with a meaning like '?' or any character?

The result will be like this:

word1 word2 ? My_WORD_HERE word1 ? ?

I will use this output for Weka ML

Thanks everybody

+4
source share
2 answers

You can do the replacement with lambda:

import re

s = 'word1 word2 My_WORD_HERE word1'
word = 'My_WORD_HERE'
wnb = 3

pat = r'((?:\w+\s+){{0,{0}}}){1}((?:\s+\w+){{0,{0}}})'.format(wnb, word)

res = re.sub(pat, lambda m:
    m.group(1) +
    '? '*(wnb-len(m.group(1).split())) +
    word + m.group(2) +
    ' ?'*(wnb-len(m.group(2).split())), s)
+1
source

Not a clean solution for replacing regular expressions, but should do the trick:

import re

def replaceMissingWords(text, word, placeholder):
    match = re.match(r'(\w+)?\s*(\w+)?\s*(\w+)?({0})\s*(\w+)?\s*(\w+)?\s*(\w+)?$'.format(word), text)
    if match is None:
        return text
    return ' '.join(list(map(lambda x: x is None and placeholder or x, match.groups())))

print(replaceMissingWords('word1 word2 My_WORD_HERE word1', 'My_WORD_HERE', '?'))
// output: 'word1 word2 ? My_WORD_HERE word1 ? ?'

AFAIK python regex engine , .

+1

All Articles