Getting attribute value using BeautifulSoup

I am writing a python script that will retrieve the locations of a script after parsing from a web page. Suppose there are two scenarios:

<script type="text/javascript" src="http://example.com/something.js"></script> 

and

 <script>some JS</script> 

I can get JS from the second script, that is, when JS is written inside the tags.

But is there any way I could get the src value from the first script (i.e. retrieve all the src tag values ​​in the script, e.g. http://example.com/something.js )

Here is my code

 #!/usr/bin/python import requests from bs4 import BeautifulSoup r = requests.get("http://rediff.com/") data = r.text soup = BeautifulSoup(data) for n in soup.find_all('script'): print n 

Output : some JS

Required output : http://example.com/something.js

+8
python beautifulsoup
source share
3 answers

It will receive all src values ​​only if they are present. Or else it will skip the <script>

 from bs4 import BeautifulSoup import urllib2 url="http://rediff.com/" page=urllib2.urlopen(url) soup = BeautifulSoup(page.read()) sources=soup.findAll('script',{"src":True}) for source in sources: print source['src'] 

I get the following two src values ​​as a result

 http://imworld.rediff.com/worldrediff/js_2_5/ws-global_hm_1.js http://im.rediff.com/uim/common/realmedia_banner_1_5.js 

I think this is what you want. Hope this is helpful.

+22
source share

Get 'src' from script node.

 import requests from bs4 import BeautifulSoup r = requests.get("http://rediff.com/") data = r.text soup = BeautifulSoup(data) for n in soup.find_all('script'): print "src:", n.get('src') <==== 
+5
source share

This should work, you just filter to find all script tags and then determine if they have the 'src' attribute. If they do, then the javascript url is contained in the src attribute, otherwise we assume that javascript is in the tag

 #!/usr/bin/python import requests from bs4 import BeautifulSoup # Test HTML which has both cases html = '<script type="text/javascript" src="http://example.com/something.js">' html += '</script> <script>some JS</script>' soup = BeautifulSoup(html) # Find all script tags for n in soup.find_all('script'): # Check if the src attribute exists, and if it does grab the source URL if 'src' in n.attrs: javascript = n['src'] # Otherwise assume that the javascript is contained within the tags else: javascript = n.text print javascript 

This conclusion of this

 http://example.com/something.js some JS 
+1
source share

All Articles