Put a stroke at the end of each line that includes foo

Question

Put a stroke at the end of each line that includes foo

I have a list with a lot of lines, each of which accepts object-object-verb-object, for example:

  Jane likes fred
 Chris dislikes joe
 Nate knows jill

To build a network graph that expresses the various relationships between nodes in directed color coded edges, I will need to replace the verb with an arrow and put a color code at the end of each line, which will simplify this a bit:

  Jane -> Fred red;
 Chris -> Joe blue;
 Nate -> Jill black;

There are only a small number of verbs, so replacing them with an arrow is just a few search and replace commands. However, before that I will need to put the color code at the end of each line corresponding to the verb of the line. I would like to do this using Python.

These are my childhood steps in programming, so please be clear and include the code that is read in a text file.

Thank you for your help!

+4

python text-processing

Karasu Oct 05 '09 at 21:04

source share

7 answers

It sounds like you will want to explore dictionaries and string formatting . In general, if you need help with programming, simply put any problem that you have into extremely small discrete pieces, look for these pieces yourself, and then you can formulate all this in a larger answer. An excellent resource for this type of search.

Also, if you have a general curiosity about Python, find or view the official Python documentation. If you don't always know where to start, read the Python tutorial or find a book to go through. A week or two of investment to get a good basic knowledge of what you are doing will pay off again and again when you are done.

 verb_color_map = { 'likes': 'red', 'dislikes': 'blue', 'knows': 'black', } with open('infile.txt') as infile: # assuming you've stored your data in 'infile.txt' for line in infile: # Python uses the name object, so I use object_ subject, verb, object_ = line.split() print "%s -> %s %s;" % (subject, object_, verb_color_map[verb])

+5

Dan passaro Oct 05 '09 at 21:27

source share

Simple enough; assuming verb lists are fixed and small, this is easy to do with a dictionary and a for loop:

 VERBS = { "likes": "red" , "dislikes": "blue" , "knows": "black" } def replace_verb (line): for verb, color in VERBS.items(): if verb in line: return "%s %s;" % ( line.replace (verb, "->") , color ) return line def main (): filename = "my_file.txt" with open (filename, "r") as fp: for line in fp: print replace_verb (line) # Allow the module to be executed directly on the command line if __name__ == "__main__": main ()

+3

John millikin Oct 05 '09 at 21:19

source share

Are you sure that this is not a little homework :) If yes, then everything is in order to undress. Without going into details, think about the tasks you are trying to do:

For each row:

read it
divide it into words (in the space -.split ())
converts middle word to color (based on display -> cf: python dict ()
type the first word, arrow, third word and color

Code using NetworkX (networkx.lanl.gov/)

 ''' plot relationships in a social network ''' import networkx ## make a fake file 'ex.txt' in this directory ## then write fake relationships to it. example_relationships = file('ex.txt','w') print >> example_relationships, '''\ Jane Doe likes Fred Chris dislikes Joe Nate knows Jill \ ''' example_relationships.close() rel_colors = { 'likes': 'blue', 'dislikes' : 'black', 'knows' : 'green', } def split_on_verb(sentence): ''' we know the verb is the only lower cased word >>> split_on_verb("Jane Doe likes Fred") ('Jane Does','Fred','likes') ''' words = sentence.strip().split() # take off any outside whitespace, then split # on whitespace if not words: return None # if there aren't any words, just return nothing verbs = [x for x in words if x.islower()] verb = verbs[0] # we want the '1st' one (python numbers from 0,1,2...) verb_index = words.index(verb) # where is the verb? subject = ' '.join(words[:verb_index]) obj = ' '.join(words[(verb_index+1):]) # 'object' is already used in python return (subject, obj, verb) def graph_from_relationships(fh,color_dict): ''' fh: a filehandle, ie, an opened file, from which we can read lines and loop over ''' G = networkx.DiGraph() for line in fh: if not line.strip(): continue # move on to the next line, # if our line is empty-ish (subj,obj,verb) = split_on_verb(line) color = color_dict[verb] # cf: python 'string templates', there are other solutions here # this is the print "'%s' -> '%s' [color='%s'];" % (subj,obj,color) G.add_edge(subj,obj,color) # return G G = graph_from_relationships(file('ex.txt'),rel_colors) print G.edges() # from here you can use the various networkx plotting tools on G, as you're inclined.

+1

Gregg lind Oct 05 '09 at 21:21

source share

Python 2.5:

 import sys from collections import defaultdict codes = defaultdict(lambda: ("---", "Missing action!")) codes["likes"] = ("-->", "red") codes["dislikes"] = ("-/>", "green") codes["loves"] = ("==>", "blue") for line in sys.stdin: subject, verb, object_ = line.strip().split(" ") arrow, color = codes[verb] print subject, arrow, object_, color, ";"

0

Georg Schölly Oct 05 '09 at 21:24

source share

In addition to the question, Karasu also said (in a comment on one answer): “In the actual input, both objects and objects change unpredictably between one and two words.”

Well, that’s how I would solve it.

 color_map = \ { "likes" : "red", "dislikes" : "blue", "knows" : "black", } def is_verb(word): return word in color_map def make_noun(lst): if not lst: return "--NONE--" elif len(lst) == 1: return lst[0] else: return "_".join(lst) for line in open("filename").readlines(): words = line.split() # subject could be one or two words if is_verb(words[1]): # subject was one word s = words[0] v = words[1] o = make_noun(words[2:]) else: # subject was two words assert is_verb(words[2]) s = make_noun(words[0:2]) v = words[2] o = make_noun(words[3:]) color = color_map[v] print "%s -> %s %s;" % (s, o, color)

Some notes:

0) We really do not need a “c” for this problem, and writing this way makes the program more portable for older versions of Python. This should work on Python 2.2 and newer, I think (I tested only on Python 2.6).

1) You can modify make_noun () to have any strategy that you find useful for handling multiple words. I showed that they simply associate them with underscores, but you can have a dictionary with adjectives and throw them out, have a dictionary of nouns and choose one or the other.

2) You can also use regular expressions for more convenient matching. Instead of just using the dictionary for color_map, you could have a list of tuples with a regular expression combined with a replacement color, and then when the regular expression matches, replace the color.

0

steveha Oct 05 '09 at 23:10

source share

Here is an improved version of my previous answer. It uses regex to make a fuzzy match on a verb. All these works:

 Steve loves Denise Bears love honey Maria interested Anders Maria interests Anders

Love regular expression pattern? matches "love" plus optional "s". Sample "interest. *" Corresponds to "interests" plus anything. Patterns with multiple alternatives separated by vertical columns are the same if one of the alternatives matches.

 import re re_map = \ [ ("likes?|loves?|interest.*", "red"), ("dislikes?|hates?", "blue"), ("knows?|tolerates?|ignores?", "black"), ] # compile the regular expressions one time, then use many times pat_map = [(re.compile(s), color) for s, color in re_map] # We dont use is_verb() in this version, but here it is. # A word is a verb if any of the patterns match. def is_verb(word): return any(pat.match(word) for pat, color in pat_map) # Return color from matched verb, or None if no match. # This detects whether a word is a verb, and looks up the color, at the same time. def color_from_verb(word): for pat, color in pat_map: if pat.match(word): return color return None def make_noun(lst): if not lst: return "--NONE--" elif len(lst) == 1: return lst[0] else: return "_".join(lst) for line in open("filename"): words = line.split() # subject could be one or two words color = color_from_verb(words[1]) if color: # subject was one word s = words[0] o = make_noun(words[2:]) else: # subject was two words color = color_from_verb(words[1]) assert color s = make_noun(words[0:2]) o = make_noun(words[3:]) print "%s -> %s %s;" % (s, o, color)

Hope it’s clear how to take this answer and expand it. You can easily add more patterns to fit more verbs. You can add logic to detect the "is" and "in," and drop them so that "Anders is interested in Mary." And so on.

If you have any questions, I would be happy to explain this further. Good luck.

0

steveha Oct 6 '09 at 4:51

source share

leonm · Accepted Answer · 2009-10-05T21:34:23+0000

verbs = {"dislikes":"blue", "knows":"black", "likes":"red"} for s in open("/tmp/infile"): s = s.strip() for verb in verbs.keys(): if (s.count(verb) > 0): print s.replace(verb,"->")+" "+verbs[verb]+";" break

Edit: Rather, use "for s in the open"

Put a stroke at the end of each line that includes foo

More articles: