How to take an item after re.compile?

Question

How to take an item after re.compile?

Using "re", I compile the handshake data as follows:

piece_request_handshake = re.compile('13426974546f7272656e742070726f746f636f6c(?P<reserved>\w{16})(?P<info_hash>\w{40})(?P<peer_id>\w{40})') handshake = piece_request_handshake.findall(hex_data)

Then i will print it

I cannot add an image because I am a beginner, so this is the output:

 root@debian :/home/florian/Téléchargements# python script.py [('0000000000100005', '606d4759c464c8fd0d4a5d8fc7a223ed70d31d7b', '2d5452323532302d746d6e6a657a307a6d687932')]

My question is: how can I take only the second part of this data, that is, "hash_info" ("606d47 ...")?

I already tried with the re group with the following line:

  print handshake.group('info_hash')

But the result is an error (sorry, I can not show the screen ...):

 * root@debian :/home/florian/Téléchargements# python script.py Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner self.run() File "script.py", line 122, in run self.p.dispatch(0, PieceRequestSniffer.cb) File "script.py", line 82, in cb print handshake.group('info_hash') AttributeError: 'list' object has no attribute 'group'*

This is the beginning of my complete code for the curious:

 import pcapy import dpkt from threading import Thread import re import binascii import socket import time liste=[] prefix = '13426974546f7272656e742070726f746f636f6c' hash_code = re.compile('%s(?P<reserved>\w{16})(?P<info_hash>\w{40})(?P<peer_id>\w{40})' % prefix) match = hash_code.match() piece_request_handshake = re.compile('13426974546f7272656e742070726f746f636f6c(?P<aaa>\w{16})(?P<bbb>\w{40})(?P<ccc>\w{40})') piece_request_tcpclose = re.compile('(?P<start>\w{12})5011') #-----------------------------------------------------------------INIT------------------------------------------------------------ class PieceRequestSniffer(Thread): def __init__(self, dev='eth0'): Thread.__init__(self) self.expr = 'udp or tcp' self.maxlen = 65535 # max size of packet to capture self.promiscuous = 1 # promiscuous mode? self.read_timeout = 100 # in milliseconds self.max_pkts = -1 # number of packets to capture; -1 => no limit self.active = True self.p = pcapy.open_live(dev, self.maxlen, self.promiscuous, self.read_timeout) self.p.setfilter(self.expr) @staticmethod def cb(hdr, data): eth = dpkt.ethernet.Ethernet(str(data)) ip = eth.data #------------------------------------------------------IPV4 AND TCP PACKETS ONLY--------------------------------------------------- #Select Ipv4 packets because of problem with the .p in Ipv6 if eth.type == dpkt.ethernet.ETH_TYPE_IP6: return else: #Select only TCP protocols if ip.p == dpkt.ip.IP_PROTO_TCP: tcp = ip.data src_ip = socket.inet_ntoa(ip.src) dst_ip = socket.inet_ntoa(ip.dst) fin_flag = ( tcp.flags & dpkt.tcp.TH_FIN ) != 0 #if fin_flag: #print "TH_FIN src:%s dst:%s" % (src_ip,dst_ip) try: #Return hexadecimal representation hex_data = binascii.hexlify(tcp.data) except: return #-----------------------------------------------------------HANDSHAKE------------------------------------------------------------- handshake = piece_request_handshake.findall(hex_data) if handshake and (src_ip+" "+dst_ip) not in liste and (dst_ip+" "+src_ip) not in liste and handshake != '': liste.append(src_ip+" "+dst_ip) print match.group('info_hash')

+5

python filter hash

flo May 19 '15 at 11:34

source share

2 answers

re.findall will return a list of tuples. The group() call works with Match objects returned by some other functions in re :

 for match in re.finditer(needle, haystack): print match.group('info_hash')

Also, you might not need findall if you just match one handshake.

+1

akaIDIOT May 19 '15 at 11:37

source share

mhawke · Accepted Answer · 2015-05-19T12:04:12+0000

re.findall() returns a list of tuples, each of which contains the corresponding rows that correspond to the named groups in the re template. This example (using a simplified template) shows that you can access the required item with indexing:

 import re prefix = 'prefix' pattern = re.compile('%s(?P<reserved>\w{4})(?P<info_hash>\w{10})(?P<peer_id>\w{10})' % prefix) handshake = 'prefix12341234567890ABCDEF1234' # sniffed data match = pattern.findall(handshake) >>> print match [('1234', '1234567890', 'ABCDEF1234')] >>> info_hash = match[0][1] >>> print info_hash 1234567890

But the point of named groups is a way to access mapped values for a named group by name. You can use re.match() :

 import re prefix = 'prefix' pattern = re.compile('%s(?P<reserved>\w{4})(?P<info_hash>\w{10})(?P<peer_id>\w{10})' % prefix) handshake = 'prefix12341234567890ABCDEF1234' # sniffed data match = pattern.match(handshake) >>> print match <_sre.SRE_Match object at 0x7fc201efe918> >>> print match.group('reserved') 1234 >>> print match.group('info_hash') 1234567890 >>> print match.group('peer_id') ABCDEF1234

Values are also available using dictionary access:

 >>> d = match.groupdict() >>> d {'peer_id': 'ABCDEF1234', 'reserved': '1234', 'info_hash': '1234567890'} >>> d['info_hash'] '1234567890'

Finally, if there are several acknowledgment sequences in the input, you can use re.finditer() :

 import re prefix = 'prefix' pattern = re.compile('%s(?P<reserved>\w{4})(?P<info_hash>\w{10})(?P<peer_id>\w{10})' % prefix) handshake = 'blahprefix12341234567890ABCDEF1234|randomjunkprefix12349876543210ABCDEF1234,more random junkprefix1234hellothereABCDEF1234...' # sniffed data for match in pattern.finditer(handshake): print match.group('info_hash')

Conclusion:

  1234567890
 9876543210
 hellothere

How to take an item after re.compile?

More articles: