Screenshots Screenshots of the ugliest HTML you've ever seen in your life

I use PHP and libtidy to try to display a screen, which may be the most horrible and distorted use of HTML tables in history. The site closes several tables, tags, etc., fonts or bold tags and sequentially places many different layers of tables in the tables.

Fragment example:

<center>
<table border="1" bordercolor="#000000" cellspacing="0" cellpadding="0">
<tr>
<td width="50%">
<center>
Home Team - <b>Wildcats<td>
<center>
Away Team - <b>Polar Bears<tr>
<td colspan="2">
<center>
<b><font size="+1">Rosters<tr>
<td valign="top">
<center>
<table border="0" cellspacing="0">
<tr>
<td>
<font size="2">1&nbsp;<td>
<font size="2">Baird, T<tr>
<td>
<font size="2">2&nbsp;<td>
<font size="2">Knight, P<tr>
<td>
<font size="2">8&nbsp;<td>
<font size="2">Miller, B<tr>
<td>
<font size="2">9&nbsp;<td>
<font size="2">Huebsch, B<tr>
<td>
<font size="2">11&nbsp;<td>
<font size="2">Buschmann, C<tr>
<td>
<font size="2">12&nbsp;<td>
<font size="2">Reding, J<tr>
<td>
<font size="2">14&nbsp;<td>
<font size="2">Simpson, S<tr>
<td>
<font size="2">27&nbsp;<td>
<font size="2">Kupferschmidt, M<tr>
<td>
<font size="2">28&nbsp;<td>
<font size="2">Anderson, D<tr>
<td>
<font size="2">31&nbsp;<td>
<font size="2">Gehrts, J<tr>
<td>
<font size="2">39&nbsp;<td>
<font size="2">McGinnis, G<tr>
<td>
<font size="2">42&nbsp;<td>
<font size="2">Temple, B<tr>
<td>
<font size="2">44&nbsp;<td>
<font size="2">Kemplin, A<tr>
<td>
<font size="2">77&nbsp;<td>
<font size="2">Weiner, B<tr>
<td>
<font size="2">95&nbsp;<td>
<font size="2">
Zytkoskie, D</table>
<td valign="top">
<center>
<table border="0" cellspacing="0">
<tr>
<td>
<font size="2">5&nbsp;<td>
<font size="2">Mack, A<tr>
<td>
<font size="2">8&nbsp;<td>
<font size="2">Foucault, R<tr>
<td>
<font size="2">11&nbsp;<td>
<font size="2">Oberpriller, D *<tr>
<td>
<font size="2">12&nbsp;<td>
<font size="2">Underwood, J<tr>
<td>
<font size="2">15&nbsp;<td>
<font size="2">Oberpriller, M<tr>
<td>
<font size="2">19&nbsp;<td>
<font size="2">Langfus, B<tr>
<td>
<font size="2">25&nbsp;<td>
<font size="2">Carroll, R<tr>
<td>
<font size="2">30&nbsp;<td>
<font size="2">Hirdler, T<tr>
<td>
<font size="2">33&nbsp;<td>
<font size="2">Gibson, S<tr>
<td>
<font size="2">35&nbsp;<td>
<font size="2">Marthaler, C<tr>
<td>
<font size="2">44&nbsp;<td>
<font size="2">Yurik, J<tr>
<td>
<font size="2">58&nbsp;<td>
<font size="2">
Gronemeyer, S</table>
<tr>
<td colspan="2">
<center>
<b><font size="+1">Goals<tr>
<td valign="top">
<center>
<table border="1" cellspacing="0" width="100%">
<td>
<b><font size="2">Player<td>
<b><font size="2">Period<td>
<b><font size="2">Time<td>
<b><font size="2">Assist 1<td>
<b><font size="2">Assist 2<td>
<b><font size="2">SH<td>
<b><font size="2">PP<tr>
<td nowrap>
<font size="2">Kupferschmidt,&nbsp;M<td>
<font size="2">1<td>
<font size="2">12:51<td nowrap>
<font size="2">Kemplin,&nbsp;A<td nowrap>
<font size="2">None<td>
<font size="2">
<center>
<td>
<font size="2">
<center>
<tr>
<td nowrap>
<font size="2">McGinnis,&nbsp;G<td>
<font size="2">1<td>
<font size="2">12:33<td nowrap>
<font size="2">Huebsch,&nbsp;B<td nowrap>
<font size="2">None<td>
<font size="2">
<center>
<td>
<font size="2">
<center>
<tr>
<td nowrap>
<font size="2">Kupferschmidt,&nbsp;M<td>
<font size="2">2<td>
<font size="2">16:01<td nowrap>
<font size="2">None<td nowrap>
<font size="2">None<td>
<font size="2">
<center>
<td>
<font size="2">
<center>
<tr>
<td nowrap>
<font size="2">Buschmann,&nbsp;C<td>
<font size="2">3<td>
<font size="2">00:38<td nowrap>
<font size="2">None<td nowrap>
<font size="2">None<td>
<font size="2">
<center>
<td>
<font size="2">
<center>
</table>
<td valign="top">
<center>
<table border="1" cellspacing="0" width="100%">
<td>
<b><font size="2">Player<td>
<b><font size="2">Period<td>
<b><font size="2">Time<td>
<b><font size="2">Assist 1<td>
<b><font size="2">Assist 2<td>
<b><font size="2">SH<td>
<b><font size="2">PP<tr>
<td nowrap>
<font size="2">Oberpriller,&nbsp;D *<td>
<font size="2">3<td>
<font size="2">12:31<td nowrap>
<font size="2">Gronemeyer,&nbsp;S<td nowrap>
<font size="2">None<td>
<font size="2">
<center>
<td>
<font size="2">
<center>
</table>
<tr>
<td colspan="2">
<center>
<b><font size="+1">Penalties<tr>
<td valign="top">
<center>
<table border="1" cellspacing="0" width="100%">
<td>
<b><font size="2">Player<td>
<font size="2"><b>Period<td>
<font size="2"><b>Minutes<td>
<font size="2"><b>Offense<td>
<font size="2"><b>Start<td>
<font size="2"><b>Expired<tr>
<td nowrap>
<font size="2">Buschmann,&nbsp;C<td>
<font size="2">
<center>
3<td>
<font size="2">
<center>
2<td>
<font size="2">Interference<td>
<font size="2">11:11<td>
<font size="2">09:11<tr>
<td nowrap>
<font size="2">Buschmann,&nbsp;C<td>
<font size="2">
<center>
3<td>
<font size="2">
<center>
2<td>
<font size="2">Unsportmanlike Conduct<td>
<font size="2">03:26<td>
<font size="2">01:26<tr>
<td nowrap>
<font size="2">Bench<td>
<font size="2">
<center>
3<td>
<font size="2">
<center>
2<td>
<font size="2">Too Many Men<td>
<font size="2">01:46<td>
<font size="2">
00:00</table>
<td valign="top">
<center>
<table border="1" cellspacing="0" width="100%">
<td>
<b><font size="2">Player<td>
<font size="2"><b>Period<td>
<font size="2"><b>Minutes<td>
<font size="2"><b>Offense<td>
<font size="2"><b>Start<td>
<font size="2"><b>Expired<tr>
<td nowrap>
<font size="2">Marthaler,&nbsp;C<td>
<font size="2">
<center>
1<td>
<font size="2">
<center>
2<td>
<font size="2">Interference<td>
<font size="2">01:19<td>
<font size="2">16:19<tr>
<td nowrap>
<font size="2">Underwood,&nbsp;J<td>
<font size="2">
<center>
2<td>
<font size="2">
<center>
2<td>
<font size="2">Interference<td>
<font size="2">12:32<td>
<font size="2">10:32<tr>
<td nowrap>
<font size="2">Marthaler,&nbsp;C<td>
<font size="2">
<center>
3<td>
<font size="2">
<center>
2<td>
<font size="2">Interference<td>
<font size="2">11:39<td>
<font size="2">
09:39</table>
<tr>
<td colspan="2">
<center>
<font size="+1"><b>Goalies<tr>
<td>
<center>
<table border="1" cellspacing="0" width="100%">
<td>
<b><font size="2">Name<td>
<font size="2"><b>Shots<td>
<font size="2"><b>Goals<tr>
<td>
<font size="2">Baird,&nbsp;T<td>
<font size="2">20<td>
<font size="2">1<tr>
<td>
<font size="2"><b>Open Net<td>
<td>
<font size="2">
0</table>
<td>
<center>
<table border="1" cellspacing="0" width="100%">
<td>
<b><font size="2">Name<td>
<font size="2"><b>Shots<td>
<font size="2"><b>Goals<tr>
<td>
<font size="2">Hirdler,&nbsp;T<td>
<font size="2">42<td>
<font size="2">

Oddly enough, all browsers seem just fine. PHPTidy knows how to do this well, but tables are nested so deeply and almost randomly that it is really hard to overcome using the XPath DOM.

Does anyone have any recommendations on other approaches for this?

-: , strip_tags(), , tr td, libtidy. . , , .

+5
5

, , . HTML-, Regex - <tr> <td>, <tr> <td>, . ​​- <td>, . .

- , <div> <p>, ( ) .

+3

, Python, Beautiful Soup HTML. HTML , .

#!/usr/bin/env python

from BeautifulSoup import BeautifulSoup

html = "long string of html"
soup = BeautifulSoup(html)
print soup.prettify()
+2

, html . strip_tags.

$clean = strip_tags($input);

// example: <p>Test paragraph.</p> <a href="#fragment">Other text</a>
// returns: Test paragraph. Other text
+2

, , , , XML.

0

xpath Python lxml, IMDB Top 250. , , .

The following code parses the saved IMDB Top 250 ( top250.html) page and stores the extracted information in the sqlite ( top250.db) database

import sqlite3
from lxml import html

tree = html.parse('top250.html')

class TopMovie(object):
    base_xpath = "/html/body/div/div[2]/layer/div[3]/table/tr/td[3]/div/table/tr/td/table/tr[%d]"

    def __init__(self, num):
        self.rank = num
        self.xpath = self.base_xpath % (self.rank + 1)

    def rating(self):
        return tree.xpath(self.xpath + '/td[2]/font')[0].text

    def link(self):
        return tree.xpath(self.xpath + '/td[3]/font/a')[0].values()[0]

    def title(self):
        return tree.xpath(self.xpath + '/td[3]/font')[0].text_content()

    def votes(self):
        return tree.xpath(self.xpath + '/td[4]/font')[0].text


def main():
    conn = sqlite3.connect('top250.db')
    conn.execute("""DROP TABLE IF EXISTS movies""")
    conn.execute("""
        CREATE TABLE movies (
            id INTEGER PRIMARY KEY,
            title TEXT,
            link TEXT,
            rating TEXT,
            votes INTEGER
        )""")

    for n in xrange(1, 251):
        m = TopMovie(n)
        query = r'INSERT INTO movies VALUES (%d, "%s", "%s", "%s", "%s")' \
            % (n, m.title(), m.link(), m.rating(), m.votes().replace(',', ''))
        conn.execute(query)

    conn.commit()
    conn.close()


if __name__ == "__main__":
    main()
0
source

All Articles