Recently, I needed to convert the text output of "tcpdump -i eth0 -neXXs0" to a pcap file. So I wrote a python script that converts the information into an intermediate format that text2pcap understands. Since this is my first python program, it is obvious to improve it. I want knowledgeable people to drive away any discrepancy and / or improve it.
Enter
Tcpdump output has the following format:
20: 11: 32,001190 00: 16: 76: 7f: 2b: b1> 00: 11: 5c: 78: ca: c0, ethertype IPv4 (0x0800), length 72: 123,236.188.140.41756> 94.59.34.210.45931: UDP, length 30
0x0000: 0011 5c78 cac0 0016 767f 2bb1 0800 4500 ..\x....v.+...E. 0x0010: 003a 0000 4000 4011 812d 7bec bc8c 5e3b .: ..@. @..-{...^; 0x0020: 22d2 a31c b36b 0026 b9bd 2033 6890 ad33 "....k.&...3h..3 0x0030: e845 4b8d 2ba1 0685 0cb3 70dd 9b98 76d8 .EK.+.....p...v. 0x0040: 8fc6 8293 bf33 325a .....32Z
Output
enter the code here
The format understood by text2pcap is:
20: 11: 32.001190
0000: 00 11 5c 78 ca c0 00 16 76 7f 2b b1 08 00 45 00 ..\x....v.+...E. 0010: 00 3a 00 00 40 00 40 11 81 2d 7b ec bc 8c 5e 3b .: ..@. @..-{...^; 0020: 22 d2 a3 1c b3 6b 00 26 b9 bd 20 33 68 90 ad 33 "....k.&...3h..3 0030: e8 45 4b 8d 2b a1 06 85 0c b3 70 dd 9b 98 76 d8 .EK.+.....p...v. 0040: 8f c6 82 93 bf 33 32 5a .....32Z
Below is my code.
import re # Identify time of the current packet. time = re.compile ('(..:..:..\.[\w]*) ') # Get individual elements from the packet. ie. offset, hexdump, chars all = re.compile('[ |\t]+0x([\w]+:) +(.+) +(.*)') # Regex for two spaces twoSpaces = re.compile(' +') # Regex for single space singleSpace = re.compile(' ') # Single byte pattern. singleBytePattern = re.compile(r'([\w][\w])') # Open files. f = open ('pcap.txt', 'r') outfile = open ('ashu.txt', 'w') for line in f: result = time.match (line) if result: # If current line contains time format dump only time print result.group() outfile.write (result.group() + '\n') else: print line, # Split line containing hex dump and tokenize into list elements. result = all.split (line) if result: i = 0 for values in result: if (i == 2): # Strip off additional spaces in hex dump # Useful when hex dump does not end in 16 bytes boundary. val = twoSpaces.sub ('', values) # Tokenize individual elements seperated by single space. byteResult = singleSpace.split (val) for twoByte in byteResult: # Identify individual byte singleByte = singleBytePattern.split(twoByte) byteOffset = 0 for oneByte in singleByte: if ((byteOffset == 1) or (byteOffset == 3)): # Write out individual byte with a space char appended print oneByte, outfile.write (oneByte+ ' ') byteOffset = byteOffset + 1 elif (i == 3): # Write of char format of hex dump print " "+values, outfile.write (' ' + values+ ' ') elif (i == 4): outfile.write (values) else: print values, outfile.write (values + ' ') i=i+1 else: print "could not split" f.close () outfile.close ()
source share