Python Parsing Log File for IP Address and Protocol

Question

Python Parsing Log File for IP Address and Protocol

This is my first question asked here on stackoverflow, and I'm really looking forward to participating in this community. I am new to programming and python was the most recommended first program for many people.

In any case . I have a log file that looks like this:

"No.","Time","Source","Destination","Protocol","Info" "1","0.000000","120.107.103.180","172.16.112.50","TELNET","Telnet Data ..." "2","0.000426","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." "3","0.019849","172.16.113.168","172.16.112.50","TCP","21582 > telnet [ACK]" "4","0.530125","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..." "5","0.530634","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..."

And I wanted to parse the log file using Python so that it looked like this:

From IP 135.13.216.191 Number of protocols: (IMF 1) (SMTP 38) (TCP 24) (Total: 63)

I would really like some help on what needs to be done to solve this problem, should I use lists and scroll through it or dictionaries / tuples?

Thanks in advance for your help!

+6

python loops parsing hash

John smith Oct 24 '12 at 20:05

source share

3 answers

You can parse the file using csv module :

 import csv with open('logfile.txt') as logfile: for row in csv.reader(logfile): no, time, source, dest, protocol, info = row # do stuff with these

I can’t say what you are asking, but I think you want:

 import csv from collections import defaultdict # A dictionary whose values are by default (a # dictionary whose values are by default 0) bySource = defaultdict(lambda: defaultdict(lambda: 0)) with open('logfile.txt') as logfile: for row in csv.DictReader(logfile): bySource[row["Source"]][row["Protocol"]] += 1 for source, protocols in bySource.iteritems(): protocols['Total'] = sum(protocols.values()) print "From IP %s Protocol Count: %s" % ( source, ' '.join("(%s: %d)" % item for item in protocols.iteritems()) )

+9

Eric Oct 24 '12 at 20:12

source share

I would start by first reading the file into a list:

 contents = [] with open("file_path") as f: contents = f.readlines()

Then you can break each line into its own list:

 ips = [l[1:-1].split('","') for l in contents]

Then we can match them in a dict:

 sourceIps = {} for ip in ips: try: sourceIps[ip[2]].append(ip) except: sourceIps[ip[2]] = [ip]

And finally print the result:

 for ip, stuff in sourceIps.iteritems(): print "From {0} ... ".format(ip, ...)

+1

Will Oct 24 '12 at 20:13

source share

Skunkwaffle · Accepted Answer · 2012-10-24T20:33:01+0000

First you will want to read in a text file

 # Open the file file = open('log_file.csv') # readlines() will return the data as a list of strings, one for each line log_data = file.readlines() # close the log file file.close()

Set up a dictionary to store results

 results = {}

Now iterate over your data, one line at a time and write the protocol in the dictionary

 for entry in log_data: entry_data = entry.split(',') # We are going to have a separate entry for each source ip # If we haven't already seen this ip, we need to make an entry for it if entry_data[2] not in results: results[entry_data[2]] = {'total':0} # Now check to see if we've seen the protocol for this ip before # If we haven't, add a new entry set to 0 if entry_data[4] not in results[entry_data[2]]: results[entry_data[2]][entry_data[4]] = 0 # Now we increment the count for this protocol results[entry_data[2]][entry_data[4]] += 1 # And we increment the total count results[entry_data[2]]['total'] += 1

After you have calculated everything, just go to your calculations and print the results

 for ip in results: # Here we're printing a string with placeholders. the {0}, {1} and {2} will be filled # in by the call to format print "from: IP {0} Protocol Count: {1})".format( ip, # And finally create the value for the protocol counts with another format call # The square braces with the for statement inside create a list with one entry # for each entry, in this case, one entry for each protocol # We use ' '.join to join each of the counts with a string ' '.join(["({0}: {1})".format(protocol, results[ip][protocol] for protocol in results[ip])]))

Python Parsing Log File for IP Address and Protocol

More articles: