Hello ...">

In Haskell, how do you extract strings from an XML document?

If I have an XML document, for example:

<root> <elem name="Greeting"> Hello </elem> <elem name="Name"> Name </elem> </root> 

and some Haskell type / data definitions:

  type Name = String type Value = String data LocalizedString = LS Name Value 

and I wanted to write a Haskell function with the following signature:

  getLocalizedStrings :: String -> [LocalizedString] 

where the first parameter was the XML text and the return value:

  [LS "Greeting" "Hello", LS "Name" "Name"] 

How can i do this?

If HaXml is the best tool, how would you use HaXml to achieve the above goal?

Thanks!

+7
xml haskell
source share
4 answers

I never worried about how to extract bits from XML documents using HaXML ; HXT met all my needs.

 {-# LANGUAGE Arrows #-} import Data.Maybe import Text.XML.HXT.Arrow type Name = String type Value = String data LocalizedString = LS Name Value getLocalizedStrings :: String -> Maybe [LocalizedString] getLocalizedStrings = (.) listToMaybe . runLA $ xread >>> getRoot atTag :: ArrowXml a => String -> a XmlTree XmlTree atTag tag = deep $ isElem >>> hasName tag getRoot :: ArrowXml a => a XmlTree [LocalizedString] getRoot = atTag "root" >>> listA getElem getElem :: ArrowXml a => a XmlTree LocalizedString getElem = atTag "elem" >>> proc x -> do name <- getAttrValue "name" -< x value <- getChildren >>> getText -< x returnA -< LS name value 

You probably need a little more error checking (i.e. it's not just lazy to use atTag like me, actually make sure that <root> is root, <elem> is a direct descendant, etc.), but it works just fine in your example.


Now, if you need an introduction to Arrow , unfortunately, I don’t know any good. I myself learned this "throw into the ocean to learn how to swim."

Something that might be useful to keep in mind is that the proc / -< syntax is just sugar for basic arrow operations ( arr , >>> , etc.), just like do / <- is just sugar for the basic operations of the monad ( return , >>= , etc.). The following equivalents:

 getAttrValue "name" &&& (getChildren >>> getText) >>^ uncurry LS proc x -> do name <- getAttrValue "name" -< x value <- getChildren >>> getText -< x returnA -< LS name value 
+6
source share

Use one of the XML packages.

The most popular are, in order,

  • haxml
  • Hxt
  • Xml light
  • hexpat
+3
source share

FWIW, HXT seems redundant where a simple TagSoup will work :)

+2
source share

Here is my second attempt (after getting good input from others) using TagSoup:

 module Xml where import Data.Char import Text.HTML.TagSoup type SName = String type SValue = String data LocalizedString = LS SName SValue deriving Show getLocalizedStrings :: String -> [LocalizedString] getLocalizedStrings = create . filterTags . parseTags where filterTags :: [Tag] -> [Tag] filterTags = filter (\x -> isTagOpenName "elem" x || isTagText x) create :: [Tag] -> [LocalizedString] create (TagOpen "elem" [("name", name)] : TagText text : rest) = LS name (trimWhiteSpace text) : create rest create (_:rest) = create rest create [] = [] trimWhiteSpace :: String -> String trimWhiteSpace = dropWhile isSpace . reverse . dropWhile isSpace . reverse main = do xml <- readFile "xml.xml" -- xml.xml contains the xml in the original question. putStrLn . show . getLocalizedStrings $ xml 

The first attempt demonstrated a naive (and erroneous) method for trimming a blank space.

+1
source share

All Articles