Python regex

I have a line like this that I need to parse a 2D array:

 str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"

the equiv array will be:

arr[0][0] = 813702104
arr[0][1] = 813702106
arr[1][0] = 813702141
arr[1][1] = 813702143
#... etc ...

I am trying to do this with REGEX. The line above looks like an HTML page, but I can be sure that this is the only line in this template on the page. I'm not sure if this is the best way, but all that I have now.

imgRegex = re.compile(r"(?:'(?P<main>\d+)\[(?P<thumb>\d+)\]',?)+")

If I run imgRegex.match(str).groups(), I get only one result (the first verse). How can I get a few matches back or a 2d matching object (if such a thing exists!)?

Note. Contrary to what it might look like, this is not homework

Note. Part of deux: the real line is embedded in a large HTML file, and therefore separation is not an option.

, , , . , , HTML . , .

HTML ( \d+\[\d+\] ), . - .

+3
5

findall finditer match.

: , findall , :

r"'(?P<main>\d+)\[(?P<thumb>\d+)\]',?"
+5

, . Python-

In [27]: s = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"

In [28]: d=[[int(each1.strip(']\'')) for each1 in each.split('[')] for each in s.split(',')]

In [29]: d[0][1]
Out[29]: 813702106

In [30]: d[1][0]
Out[30]: 813702141

In [31]: d
Out[31]: [[813702104, 813702106], [813702141, 813702143], [813702172, 813702174]]
+3

,

>>> str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]"
>>> imgRegex = re.compile(r"'(?P<main>\d+)\[(?P<thumb>\d+)\]',?")
>>> print imgRegex.findall(str)
[('813702104', '813702106'), ('813702141', '813702143')]

"2- " - Python, " 2- ".

+1

-, :

In [19]: str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
In [20]: ptr = re.compile( r"'(?P<one>\d+)\[(?P<two>\d+)\]'" )
In [21]: ptr.findall( str )
Out [23]:
[('813702104', '813702106'),
 ('813702141', '813702143'),
 ('813702172', '813702174')]
+1

Python [statement for item in list] . , , . ( , ), listmaker .

:

>>> str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
>>> arr = [pair for pair in str.split(",")]
>>> arr
["'813702104[813702106]'", "'813702141[813702143]'", "'813702172[813702174]'"]

, str.split( "," ), , , listmaker - , , .

- , , , , :

>>> arr = [pair[1:-2].split("[") for pair in str.split(",")]
>>> arr
>>> [['813702104', '813702106'], ['813702141', '813702143'], ['813702172', '813702174']]

This returns a two-dimensional array, as you describe, but the elements are all strings, not integers. If you are just going to use them as strings, this is far enough away. If you need them to be real integers, you simply use the "internal" listmaker as an instruction for the "external" listmaker:

>>> arr = [[int(x) for x in pair[1:-2].split("[")] for pair in str.split(",")]
>>> arr
>>> [[813702104, 813702106], [813702141, 813702143], [813702172, 813702174]]

This returns a two-dimensional array of integers representing a string similar to the one you provided, without having to load the regex engine.

+1
source

All Articles