Well, if your HTML document really has such a stable structure (which makes me scratch my head, because it's pretty rare), you can use regular expressions:
>>> import re >>> r = re.compile('<tr><td>(.*)</td><td>(.*)</td><td>(.*) s</td></tr>')
The regular expression below groups the values you want to show as a result. Then you use the sub() method of the object. If the text is in a variable (e.g. content ), just do it like this:
r.sub(r'\1_STATUS = "\2"\n\1_TIME = \3', content)
Result:
>>> print r.sub(r'\1_STATUS = "\2"\n\1_TIME = \3', content) <table border=1> <tr> <td><b>Component</b></td> <td><b>Status</b></td> <td><b>Time / Error</b></td> </tr> SAVE_DOCUMENT_STATUS = "OK" SAVE_DOCUMENT_TIME = 0.408 GET_DOCUMENT_STATUS = "OK" GET_DOCUMENT_TIME = 0.361 DVK_SEND_STATUS = "OK" DVK_SEND_TIME = 0.002 DVK_RECEIVE_STATUS = "OK" DVK_RECEIVE_TIME = 0.002 GET_USER_INFO_STATUS = "OK" GET_USER_INFO_TIME = 0.135 NOTIFICATIONS_STATUS = "OK" NOTIFICATIONS_TIME = 0.002 ERROR_LOG_STATUS = "OK" ERROR_LOG_TIME = 0.001 SUMMARY_STATUS_STATUS = "OK" SUMMARY_STATUS_TIME = 0.913 </table>
Of course, there is still a lot of garbage in the line, but this gives an idea :)
If your HTML documents are not so stable, you should really consider some XML parser, or better yet, BeautifulSoup, because it would be a black job to process an unstable structured HTML file manually.
source share