I currently have Ruby code used to clean some websites. I used Ruby because at the time I used Ruby on Rails for the site, it made sense.
Now I'm trying to port this to the Google App Engine and keep getting stuck.
I ported Python Mechanize to work with the Google App Engine, but it does not support DOM validation using XPATH.
I tried the built-in ElementTree, but it suffocated in the first block of HTML that I gave it when it came across "& mdash".
Am I still trying to hack ElementTree there, or am I trying to use something else?
thanks mark
source
share