HTML table scraper in Common Lisp?

Question

HTML table scraper in Common Lisp?

I want to extract some information from a webpage that is contained in an HTML <table>. How can I extract all the table information in a pleasant | split file?

Author | Book | Year | Comments
Bill Bryson | Short History of Nearly Everything | 2004
Stephen Hawking | A Brief History of Time | 1998 | Still haven't read.

Ideally, I would like to have a function that takes a URL and displays the file as parameters, and then gives the above output.

(defun extract-table (url filename)
       (extract-from-html-table (fetch-web-page url)))

(extract-table "http://www.mypage.com" "output.txt")

Example HTML input for the output above:

<! DOCTYPE HTML PUBLIC "- // IETF // DTD HTML // EN">
<html>
<head>
<title> Lisp </title>
</head>
<body>
<h1> Welcome to Lisp </h1>
<table class = "any" style = "font-size: 14px;">
  <TR class = "header">
    <td> Author </td>
    <TD> Book </TD>
    <td> Year </td>
    <td> Comments </td>
  </TR>
  <tr class = "odd">
    <td> Bill Bryson </td>
    <td> Short History of Nearly Everything </td>
    <td> 2004 </td>
  </tr>
  <tr>
    <td> Stephen Hawking </td>
    <td> A Brief History of Time </td>
    <td> 1998 </td>
    <td> Still haven't read. </td>
  </tr>
</table>
</body>
</html>

+5

common-lisp

anon Feb 28 '10 at 20:34

source share

1 answer

Dirk · Accepted Answer · 2010-02-28T20:41:14+0000

Drakma . , cxml. : closure-html, HTML 4. Common- Lisp.net -html .

HTML table scraper in Common Lisp?

More articles: