Python.replace () regex

I am trying to grab everything after the </html> tag and remove it, but my code does not seem to do anything. Does.replace () not support regex?

Python

z.write(article.replace('</html>.+', '</html>')) 
+73
python regex
Jul 13 2018-12-18T00:
source share
4 answers

No. Regular expressions in Python are handled by the re module.

 article = re.sub(r'(?is)</html>.+', '</html>', article) 
+118
Jul 13 '12 at 18:05
source share

You can use the re module for regular expressions, but regular expressions are probably overflowing with what you want. I can try something like

 z.write(article[:article.index("</html>") + 7] 

This is much cleaner and should be much faster than a regex based solution.

+6
Jul 13 '12 at 19:01
source share

@ Ignacio is right, +1, I’ll just give more examples.

To replace text using a regular expression, use the re.sub function:

sub (pattern, repl, string [, count, flags])

It will replace irrevocable instances of pattern text passed as string . If you need to analyze compliance to extract information about specific group captures, for isntance you can pass a function to the string argument. more details here .

<strong> Examples

 >>> import re >>> re.sub(r'a', 'b', 'banana') 'bbnbnb' >>> re.sub(r'/\d+', '/{id}', '/andre/23/abobora/43435') '/andre/{id}/abobora/{id}' 
+4
Jan 03 '17 at 16:02
source share

In this particular case, if the use of the re module is full, how about using the split (or rsplit ) method like

 se='</html>' z.write(article.split(se)[0]+se) 

For example,

 #!/usr/bin/python article='''<html>Larala Ponta Monta </html>Kurimon Waff Moff ''' z=open('out.txt','w') se='</html>' z.write(article.split(se)[0]+se) 

prints out.txt as

 <html>Larala Ponta Monta </html> 
0
Jun 24 '17 at 20:08
source share



All Articles