Determine if a regular expression matches only fixed-length strings

Is there a way to determine if a regular expression matches only fixed-length strings? My idea would be to scan *, + and? Then some intelligent logic is required to search for {m, n}, where m! = N. There is no need to take | the operator.
A small example: ^ \ d {4} - fixed length; ^ \ d {4,5} or ^ \ d + are variable length

I am using PCRE.

Thank.

Paul Prat

+3
source share
3 answers

Well, you could use the fact that the Python regex engine only allows regular length expressions in lookbehind statements:

import re
regexes = [r".x{2}(abc|def)", # fixed
           r"a|bc",           # variable/finite
           r"(.)\1",          # fixed
           r".{0,3}",         # variable/finite
           r".*"]             # variable/infinite

for regex in regexes:
    try:
        r = re.compile("(?<=" + regex + ")")
    except:
        print("Not fixed length: {}".format(regex))
    else:
        print("Fixed length: {}".format(regex))

Fixed length: .x{2}(abc|def)
Not fixed length: a|bc
Fixed length: (.)\1
Not fixed length: .{0,3}
Not fixed length: .*

, .

, Python , ? - sre_parse.py, getwidth(), , , lookbehind, re.compile() , getwidth() :

def getwidth(self):
    # determine the width (min, max) for this subpattern
    if self.width:
        return self.width
    lo = hi = 0
    UNITCODES = (ANY, RANGE, IN, LITERAL, NOT_LITERAL, CATEGORY)
    REPEATCODES = (MIN_REPEAT, MAX_REPEAT)
    for op, av in self.data:
        if op is BRANCH:
            i = sys.maxsize
            j = 0
            for av in av[1]:
                l, h = av.getwidth()
                i = min(i, l)
                j = max(j, h)
            lo = lo + i
            hi = hi + j
        elif op is CALL:
            i, j = av.getwidth()
            lo = lo + i
            hi = hi + j
        elif op is SUBPATTERN:
            i, j = av[1].getwidth()
            lo = lo + i
            hi = hi + j
        elif op in REPEATCODES:
            i, j = av[2].getwidth()
            lo = lo + int(i) * av[0]
            hi = hi + int(j) * av[1]
        elif op in UNITCODES:
            lo = lo + 1
            hi = hi + 1
        elif op == SUCCESS:
            break
    self.width = int(min(lo, sys.maxsize)), int(min(hi, sys.maxsize))
    return self.width
+4

.

, +, *, ?, {m,n}, {n} [...] ( []] [^]]). , :

 REGEX     -> ELEMENT *
 ELEMENT   -> CHARACTER ( '{' ( \d+ ) ( ',' \1 )? '}' )?
 CHARACTER -> [^+*?\\\[] | '\\' . | '[' ( '\\' . | [^\\\]] )+ ']'

PCRE :

^(?:(?:[^+*?\\\[{]|\\.|\[(?:\\.|[^\\\]])+\])(?:\{(\d+)(?:,\1)?\})?)*$
+1

regular-expressions.info, PCRE lookbehinds.

, , (?<= ) , . , , .

I'm not sure about something like a(b|cd)ethis - it is definitely not a fixed size, but it can still compile. You need to try (I don't have C / PCRE).

0
source

All Articles