Python regex use capture group to determine the length of other groups {}

I am analyzing stream hex data using python regex. I have the following package structure that I am trying to extract from a packet stream:

'\xaa\x01\xFF\x44'
  • \ xaa - start of package
  • \ x01 - data length [value can vary from 00-FF]
  • \ xFF - data
  • \ x44 - end of package

I want to use the python regex to indicate how much of the package data should match as such:

r = re.compile('\xaa(?P<length>[\x00-\xFF]{1})(.*){?P<length>}\x44')

this compilation without errors, but it does not work. I suspect this does not work because the regex mechanism cannot convert a hexadecimal value named group <length>named to an appropriate integer for use inside the regex expression {}. Is there a way by which this can be done on python without resorting to the spread of matching groups?

Background: I used erlang to unpack packages, and I was looking for something like this in python

+4
source share
2 answers

I ended up doing something like this:

self.packet_regex = \
            re.compile('(\xaa)([\x04-\xFF]{1})([\x00-\xFF]{1})([\x10-\xFF]{1})([\x00-\xFF]*)([\x00-\xFF]{1})(\x44)')

match = self.packet_regex.search(self.buffer)
if match and match.groups():
    groups = match.groups()
    if (ord(groups[1]) - 4) == len(groups[4]) + len(groups[5]) + len(groups[6]):
        ...
0
source

This is pretty much the job for what you requested. Just take a look at it.

import re
orig_str = '\xaa\x01\xFF\x44'
print orig_str
#converting original hex data into its representation form
st = repr(orig_str)
print st
#getting the representation form of regex and removing leading and trailing single quotes 
reg = re.compile(repr("(\\xaa)")[1:-1])
p = reg.search(st)

#creating the representation from matched string by adding leading and trailing single quotes
extracted_repr = "\'"+p.group(1)+"\'"
print extracted_repr

#evaluating the matched string to get the original hex information
extracted_str = eval(extracted_repr)
print extracted_str

>>>
      D
    '\xaa\x01\xffD'
    '\xaa'
     
-1
source

All Articles