How to pull CSS attributes from inline styles using BeautifulSoup

I have something like this:

<img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/> 

I use beautifulsoup to parse html. Is there any place to pull the "url" into the css attribute "background"?

+8
python css inline beautifulsoup
source share
1 answer

You have a couple of options - quick and dirty or right. The quick and dirty way (which breaks easily when you change the layout) looks like

 >>> from BeautifulSoup import BeautifulSoup >>> import re >>> soup = BeautifulSoup('<html><body><img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/></body></html>') >>> style = soup.find('img')['style'] >>> urls = re.findall('url\((.*?)\)', style) >>> urls [u'/theRealImage.jpg'] 

Obviously, you will have to play around with this to get it working with multiple img tags.

The right way, as it would be terrible for me to assume that someone is using a regular expression in a CSS string :), using a CSS parser. cssutils , the library I just found on Google and available on PyPi, looks like it can do the job.

+9
source share

All Articles