How to remove text between <script> and </script> using python?

Question

How to remove text between <script> and </script> using python?

how to remove text between <script>and </script>using python?

+5

javascript python

Niloy Jun 08 '09 at 11:30

source share

9 answers

tgray · Answer 1 · 2009-06-08T11:38:55+0000

You can use BeautifulSoup with this (and other) methods:

soup = BeautifulSoup(source.lower())
to_extract = soup.findAll('script')
for item in to_extract:
    item.extract()

This actually removes the nodes from the HTML. If you want to leave empty tags <script></script>, you will have to work with attributes item, and not just extract them from the soup.

user27478 · Answer 2 · 2009-06-08T14:45:21+0000

XSS? <script> ! ( ), http://ha.ckers.org/xss.html. , <script> . python lxml , HTML, .

, <script>, lxml :

from lxml.html import parse

root = parse(filename_or_url).getroot()
for element in root.iter("script"):
    element.drop_tree()

.. , . . , HTML : HTML: ?

2: SO, HTML, : , XML HTML ?

wr. · Answer 3 · 2009-06-08T11:35:43+0000

HTMLParser () :

import re
content = "asdf <script> bla </script> end"
x=re.search("<script>.*?</script>", content, re.DOTALL)
span = x.span() # gives (5, 27)

stripped_content = content[:span[0]] + content[span[1]:]

EDIT: re.DOTALL, tgray

annakata · Answer 4 · 2009-06-08T11:39:43+0000

<script> </script>, node?

src resig-?

uolot · Answer 5 · 2009-06-08T12:41:20+0000

, Pev wr, , :

pattern = r"(?is)<script[^>]*>(.*?)</script>"
text = """<script>foo bar  
baz bar foo  </script>"""
re.sub(pattern, '', text)

(? is) - - . script .

EDIT: , . , . lxml . , . Beautiful Soup (? , , ).

, , :

pattern = r"(?is)(<script[^>]*>)(.*?)(</script>)"
text = """<script>foo bar  
baz bar foo  </script>"""
re.sub(pattern, '\1\3', text)

Lakshman Prasad · Answer 6 · 2009-06-08T16:45:19+0000

Element Tree - . , ; - , ! ( )

ujh · Answer 7 · 2009-06-08T11:37:14+0000

I don't know, Python is good enough to tell you a solution. But if you want to use this to disinfect user input, you have to be very careful. To delete things between them and just not all. Perhaps you can take a look at existing solutions (I suppose Django includes something like this).

Simon peverett · Answer 8 · 2009-06-08T11:48:16+0000

example_text = "This is some text <script> blah blah blah </script> this is some more text."

import re
myre = re.compile("(^.*)<script>(.*)</script>(.*$)")
result = myre.match(example_text)
result.groups()
  <52> ('This is some text ', ' blah blah blah ', ' this is some more text.')

# Text between <script> .. </script>
result.group(2)
  <56> 'blah blah blah'

# Text outside of <script> .. </script>
result.group(1)+result.group(3)
  <57> 'This is some text  this is some more text.'

sqram · Answer 9 · 2009-06-08T12:34:37+0000

If you do not want to import any modules:

string = "<script> this is some js. begone! </script>"

string = string.split(' ')

for i, s in enumerate(string):
    if s == '<script>' or s == '</script>' :
        del string[i]

print ' '.join(string)

How to remove text between <script> and </script> using python?

More articles: