Matching empty strings with regular expressions

Question

Matching empty strings with regular expressions

I have a line that I am trying to break into pieces on empty lines.

Given the string s , I thought I could do this:

 re.split('(?m)^\s*$', s)

This works in some cases:

 >>> s = 'foo\nbar\n \nbaz' >>> re.split('(?m)^\s*$', s) ['foo\nbar\n', '\nbaz']

But this does not work if the line is completely empty:

 >>> s = 'foo\nbar\n\nbaz' >>> re.split('(?m)^\s*$', s) ['foo\nbar\n\nbaz']

What am I doing wrong?

[python 2.5; it makes no difference if I compile '^\s*$' with re.MULTILINE and use the compiled expression instead]

+6

python regex

John fouhy Jul 29 '09 at 1:12

source share

5 answers

The re library can be split into one or more blank lines! An empty line is a line of zero or more spaces, starting at the beginning of a line and ending at the end of a line. The special character '$' matches the end of a line or immediately before a new line at the end of a line, and in MULTILINE mode also matches before a new line (excerpt from docs ). Therefore, we need to add the special character '\ s *' to break the line. Everything is possible: -)

 >>> import re >>> text = "foo\n \n \n \nbar\n" >>> re.split("(?m)^\s*$\s*", text) ['foo\n', 'bar\n']

The same regular expression works with line breaks in the style of Windows.

 >>> import re >>> text = "foo\r\n \r\n \r\n \r\nbar\r\n" >>> re.split("(?m)^\s*$\s*", text) ['foo\r\n', 'bar\r\n']

+3

Sascha gottfried Apr 05 '13 at 10:46

source share

Is this what you want?

 >>> s = 'foo\nbar\n\nbaz' >>> re.split('\n\s*\n',s) ['foo\nbar', 'baz'] >>> s = 'foo\nbar\n \nbaz' >>> re.split('\n\s*\n',s) ['foo\nbar', 'baz'] >>> s = 'foo\nbar\n\t\nbaz' >>> re.split('\n\s*\n',s) ['foo\nbar', 'baz']

0

Sinan Ünür Jul 29 '09 at 1:31

source share

Try the following:

 blank='' with open('fu.txt') as txt: txt=txt.read().split('\n') for line in txt: if line is blank: print('blank') else: print(line)

0

Lerooy scandal Oct 12 '15 at 1:34

source share

What you are doing wrong is regular expressions. What is wrong with (Some \ ntext.). Split ('\ n')?

-2

Instance hunter Jul 29 '09 at 1:29

source share

Glenn maynard · Accepted Answer · 2009-07-29T01:29:46+0000

Try this instead:

 re.split('\n\s*\n', s)

The problem is that the “$ * ^” actually only matches “spaces (if any) that are alone on the line”, and not the new lines themselves. This leaves the separator blank when there is nothing on the line that makes no sense.

This version also gets rid of the delimiters themselves, which you probably want. Otherwise, you will have new lines attached to the beginning and end of each divided part.

Processing multiple consecutive blank lines as defining an empty block ("abc \ n \ n \ ndef" → ["abc", "," def "]) is more complicated ...

Matching empty strings with regular expressions

More articles: