Replace SRC of all IMG elements with Parser

I am looking for a way to replace the SRC attribute in all IMG tags without using regular expressions. (I would like to use any batch HTML parser included with the default Python installation) I need to reduce the source from what it could ever be:

<img src="cid:imagename"> 

I am trying to replace all src tags to point to the cid of the attachment for the HTML email, so I will also need to change what the source is, so this is just a file name with no path or extension.

+4
source share
2 answers

The Python standard library has an HTML parser, but it is not very useful and deprecated from Python 2.6. Doing this kind of BeautifulSoup is really very simple:

 from BeautifulSoup import BeautifulSoup from os.path import basename, splitext soup = BeautifulSoup(my_html_string) for img in soup.findAll('img'): img['src'] = 'cid:' + splitext(basename(img['src']))[0] my_html_string = str(soup) 
+20
source

Here is your approach to your problem. You will need to make your own code to convert the http src attribute.

 from pyparsing import * import urllib2 imgtag = makeHTMLTags("img")[0] page = urllib2.urlopen("http://www.yahoo.com") html = page.read() page.close() # print html def modifySrcRef(tokens): ret = "<img" for k,i in tokens.items(): if k in ("startImg","empty"): continue if k.lower() == "src": # or do whatever with this i = i.upper() ret += ' %s="%s"' % (k,i) return ret + " />" imgtag.setParseAction(modifySrcRef) print imgtag.transformString(html) 

Tags are converted to:

 <img src="HTTP://L.YIMG.COM/A/I/WW/BETA/Y3.GIF" title="Yahoo" height="44" width="232" alt="Yahoo!" /> <a href="r/xy"><img src="HTTP://L.YIMG.COM/A/I/WW/TBL/ALLYS.GIF" height="20" width="138" alt="All Yahoo! Services" border="0" /></a> 
+1
source

All Articles