Python Parsing Log File for IP Address and Protocol

This is my first question asked here on stackoverflow, and I'm really looking forward to participating in this community. I am new to programming and python was the most recommended first program for many people.

In any case . I have a log file that looks like this:

"No.","Time","Source","Destination","Protocol","Info" "1","0.000000","120.107.103.180","172.16.112.50","TELNET","Telnet Data ..." "2","0.000426","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." "3","0.019849","172.16.113.168","172.16.112.50","TCP","21582 > telnet [ACK]" "4","0.530125","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..." "5","0.530634","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." 

And I wanted to parse the log file using Python so that it looked like this:

From IP 135.13.216.191 Number of protocols: (IMF 1) (SMTP 38) (TCP 24) (Total: 63)

I would really like some help on what needs to be done to solve this problem, should I use lists and scroll through it or dictionaries / tuples?

Thanks in advance for your help!

+6
source share
3 answers

First you will want to read in a text file

 # Open the file file = open('log_file.csv') # readlines() will return the data as a list of strings, one for each line log_data = file.readlines() # close the log file file.close() 

Set up a dictionary to store results

 results = {} 

Now iterate over your data, one line at a time and write the protocol in the dictionary

 for entry in log_data: entry_data = entry.split(',') # We are going to have a separate entry for each source ip # If we haven't already seen this ip, we need to make an entry for it if entry_data[2] not in results: results[entry_data[2]] = {'total':0} # Now check to see if we've seen the protocol for this ip before # If we haven't, add a new entry set to 0 if entry_data[4] not in results[entry_data[2]]: results[entry_data[2]][entry_data[4]] = 0 # Now we increment the count for this protocol results[entry_data[2]][entry_data[4]] += 1 # And we increment the total count results[entry_data[2]]['total'] += 1 

After you have calculated everything, just go to your calculations and print the results

 for ip in results: # Here we're printing a string with placeholders. the {0}, {1} and {2} will be filled # in by the call to format print "from: IP {0} Protocol Count: {1})".format( ip, # And finally create the value for the protocol counts with another format call # The square braces with the for statement inside create a list with one entry # for each entry, in this case, one entry for each protocol # We use ' '.join to join each of the counts with a string ' '.join(["({0}: {1})".format(protocol, results[ip][protocol] for protocol in results[ip])])) 
0
source

You can parse the file using csv module :

 import csv with open('logfile.txt') as logfile: for row in csv.reader(logfile): no, time, source, dest, protocol, info = row # do stuff with these 

I can’t say what you are asking, but I think you want:

 import csv from collections import defaultdict # A dictionary whose values are by default (a # dictionary whose values are by default 0) bySource = defaultdict(lambda: defaultdict(lambda: 0)) with open('logfile.txt') as logfile: for row in csv.DictReader(logfile): bySource[row["Source"]][row["Protocol"]] += 1 for source, protocols in bySource.iteritems(): protocols['Total'] = sum(protocols.values()) print "From IP %s Protocol Count: %s" % ( source, ' '.join("(%s: %d)" % item for item in protocols.iteritems()) ) 
+9
source

I would start by first reading the file into a list:

 contents = [] with open("file_path") as f: contents = f.readlines() 

Then you can break each line into its own list:

 ips = [l[1:-1].split('","') for l in contents] 

Then we can match them in a dict:

 sourceIps = {} for ip in ips: try: sourceIps[ip[2]].append(ip) except: sourceIps[ip[2]] = [ip] 

And finally print the result:

 for ip, stuff in sourceIps.iteritems(): print "From {0} ... ".format(ip, ...) 
+1
source

All Articles