Re.split () with special cases

Question

Re.split () with special cases

I am new to regex and have a problem with re.split functionality.

In my case, the split should take care of the “special shoots”.

The text should be divided into ; except that there is a lead ? .

Edit: In this case, the two parts should not be separated and ? must be deleted.

Here is an example and the result I want:

 import re txt = 'abc;vwx?;yz;123' re.split(r'magical pattern', txt) ['abc', 'vwx;yz', '123']

I have tried so far this attempt:

 re.split(r'(?<!\?);', txt)

and received:

 ['abc', 'vwx?;yz', '123']

Sadly causes an unmet problem ? , and the following list comprehension refers to critical characteristics:

 [part.replace('?;', ';') for part in re.split(r'(?<!\?);', txt)] ['abc', 'vwx;yz', '123']

Is there a “quick” way to reproduce this behavior with re?

Could the re.findall function be the solution to take?

For example, an extended version of this code:

 re.findall(r'[^;]+', txt)

I am using python 2.7.3.

Thank you pending!

+4

python regex

MaM Mar 22 '13 at 16:33

source share

4 answers

Regex is not a task tool. Use the csv module instead:

 >>> txt = 'abc;vwx?;yz;123' >>> r = csv.reader([txt], delimiter=';', escapechar='?') >>> next(r) ['abc', 'vwx;yz', '123']

+5

Janne karila Mar 22 '13 at 17:03

source share

I would do it like this:

  re.sub('(?<!\?);',r'|', txt).replace('?;',';').split('|')

0

Julien grenier Mar 22 '13 at 17:06

source share

Try the following :-)

 def split( txt, sep, esc, escape_chars): ''' Split a string txt - string to split sep - separator, one character esc - escape character escape_chars - List of characters allowed to be escaped ''' l = [] tmp = [] i = 0 while i < len(txt): if len(txt) > i + 1 and txt[i] == esc and txt[i+1] in escape_chars: i += 1 tmp.append(txt[i]) elif txt[i] == sep: l.append("".join(tmp)) tmp = [] elif txt[i] == esc: print('Escape Error') else: tmp.append(txt[i]) i += 1 l.append("".join(tmp)) return l if __name__ == "__main__": txt = 'abc;vwx?;yz;123' print split(txt, ';', '?', [';','\\','?'])

Return:

 ['abc', 'vwx;yz', '123']

0

Thm Mar 25 '13 at 18:14

source share

Martijn pieters · Accepted Answer · 2013-03-22T16:54:29+0000

You cannot do what you want with one regex. Unescaping ?; after separation is a separate task, and not that you can get the re module for you while sharing it.

Just complete the task separately; you can use the generator to do the cancellation:

 def unescape(iterable): for item in iterable: yield item.replace('?;', ';') for elem in unescape(re.split(r'(?<!\?);', txt)): print elem

but it will not be faster than your understanding of the list.

Re.split () with special cases

More articles: