How to parse html file using clojure?

I am new to clojure and I need examples. Please show me how to parse html file using clojure?

+6
source share
2 answers

Enlive is a great tool for this. Shortly speaking:

(ns foo.bar (:require [net.cgrand.enlive-html :as html])) (defn fetch-page [url] (html/html-resource (java.net.URL. url))) 

Here is a good tutorial on using it as a scraper / parser, and as a template engine:

Here is a short example of page scraping.

Another option is clj-tagsoup . Enlive also uses tagoup, but also has a plugin parser, so you can add support for other parsers.

+17
source

Clojure xml parsing library is there for you.

Parses and loads the source s, which may be a file, InputStream, or String denoting a URI. Returns the xml / element structure map tree that has the keys: tag ,: attrs and: content. and fns tag for access, attrs and content. Other parsers can be provided by passing startparse, fn, the source and the ContentHandler, and the analyzer returns

Or use enlive , it is fully clojure bound or uses Java HtmlCleaner .

+4
source

All Articles