Find and replace double newlines with perl?

I am clearing some web pages that for some reason have about 8 line breaks between tags. I wanted to remove most of them, and I tried this

perl -pi -w -e "s/\n\n//g" *.html 

But no luck. For good measure, I tried

 perl -pi -w -e "s/\n//g" *.html 

and he deleted all my line breaks. What am I doing wrong?

edit I also tried \r\n\r\n , the same deal. Works as one line breaks, does nothing for two consecutive ones.

+4
source share
2 answers

Use -0 :

 perl -pi -0 -w -e "s/\n\n//g" *.html 

The problem is that, by default, -p reads the file one line at a time. There is no such thing as a line with two newlines, so you did not find it. - 0 changes the line ending character to "\0" , which probably does not exist in your file, so it processes the entire file at once. (Even if the file contains NUL, you are looking for consecutive lines of a new line, so processing it in fragments with zero constraint will not be a problem.)

You might also want to customize your regular expression, but it’s not difficult for you to know exactly what you want. Try s/\n\n+/\n/g , which will replace any number of consecutive lines of a new line with one new line.

If the file is very large, you may not have enough memory to download it in one piece. The workaround for this is to select some character that is common enough to split the file into manageable chunks and tell Perl to use it as a line terminator. But he must also be a character who does not appear inside the matches that you are trying to replace. For example, -0x2e will split the file into "." (ASCII 0x2E).

+16
source

I tried to replace a double newline with a single using the above recommendation in a large file (2.3G). With huge files, it will crash when trying to read the entire file immediately. So instead of looking for a double newline, just find the lines where only char is the newline:

 perl -pi -w -e 's/^\n$//' file.txt 
+5
source

All Articles