Re.split () with special cases

I am new to regex and have a problem with re.split functionality.

In my case, the split should take care of the “special shoots”.

The text should be divided into ; except that there is a lead ? .

Edit: In this case, the two parts should not be separated and ? must be deleted.

Here is an example and the result I want:

 import re txt = 'abc;vwx?;yz;123' re.split(r'magical pattern', txt) ['abc', 'vwx;yz', '123'] 

I have tried so far this attempt:

 re.split(r'(?<!\?);', txt) 

and received:

 ['abc', 'vwx?;yz', '123'] 

Sadly causes an unmet problem ? , and the following list comprehension refers to critical characteristics:

 [part.replace('?;', ';') for part in re.split(r'(?<!\?);', txt)] ['abc', 'vwx;yz', '123'] 

Is there a “quick” way to reproduce this behavior with re?

Could the re.findall function be the solution to take?

For example, an extended version of this code:

 re.findall(r'[^;]+', txt) 

I am using python 2.7.3.

Thank you pending!

+4
source share
4 answers

You cannot do what you want with one regex. Unescaping ?; after separation is a separate task, and not that you can get the re module for you while sharing it.

Just complete the task separately; you can use the generator to do the cancellation:

 def unescape(iterable): for item in iterable: yield item.replace('?;', ';') for elem in unescape(re.split(r'(?<!\?);', txt)): print elem 

but it will not be faster than your understanding of the list.

0
source

Regex is not a task tool. Use the csv module instead:

 >>> txt = 'abc;vwx?;yz;123' >>> r = csv.reader([txt], delimiter=';', escapechar='?') >>> next(r) ['abc', 'vwx;yz', '123'] 
+5
source

I would do it like this:

  re.sub('(?<!\?);',r'|', txt).replace('?;',';').split('|') 
0
source

Try the following :-)

 def split( txt, sep, esc, escape_chars): ''' Split a string txt - string to split sep - separator, one character esc - escape character escape_chars - List of characters allowed to be escaped ''' l = [] tmp = [] i = 0 while i < len(txt): if len(txt) > i + 1 and txt[i] == esc and txt[i+1] in escape_chars: i += 1 tmp.append(txt[i]) elif txt[i] == sep: l.append("".join(tmp)) tmp = [] elif txt[i] == esc: print('Escape Error') else: tmp.append(txt[i]) i += 1 l.append("".join(tmp)) return l if __name__ == "__main__": txt = 'abc;vwx?;yz;123' print split(txt, ';', '?', [';','\\','?']) 

Return:

 ['abc', 'vwx;yz', '123'] 
0
source

All Articles