Here is an improved version of my previous answer. It uses regex to make a fuzzy match on a verb. All these works:
Steve loves Denise Bears love honey Maria interested Anders Maria interests Anders
Love regular expression pattern? matches "love" plus optional "s". Sample "interest. *" Corresponds to "interests" plus anything. Patterns with multiple alternatives separated by vertical columns are the same if one of the alternatives matches.
import re re_map = \ [ ("likes?|loves?|interest.*", "red"), ("dislikes?|hates?", "blue"), ("knows?|tolerates?|ignores?", "black"), ] # compile the regular expressions one time, then use many times pat_map = [(re.compile(s), color) for s, color in re_map] # We dont use is_verb() in this version, but here it is. # A word is a verb if any of the patterns match. def is_verb(word): return any(pat.match(word) for pat, color in pat_map) # Return color from matched verb, or None if no match. # This detects whether a word is a verb, and looks up the color, at the same time. def color_from_verb(word): for pat, color in pat_map: if pat.match(word): return color return None def make_noun(lst): if not lst: return "--NONE--" elif len(lst) == 1: return lst[0] else: return "_".join(lst) for line in open("filename"): words = line.split() # subject could be one or two words color = color_from_verb(words[1]) if color: # subject was one word s = words[0] o = make_noun(words[2:]) else: # subject was two words color = color_from_verb(words[1]) assert color s = make_noun(words[0:2]) o = make_noun(words[3:]) print "%s -> %s %s;" % (s, o, color)
Hope it’s clear how to take this answer and expand it. You can easily add more patterns to fit more verbs. You can add logic to detect the "is" and "in," and drop them so that "Anders is interested in Mary." And so on.
If you have any questions, I would be happy to explain this further. Good luck.
source share