What is Haskell's attitude to Unicode in XML?

Question

What is Haskell's attitude to Unicode in XML?

I want to know what is the official solution for processing Unicode XML in Haskell. I noticed that HXT uses a simple String representation (Unicode character list !!!) for the text.

http://hackage.haskell.org/packages/archive/hxt/9.3.1.0/doc/html/Text-XML-HXT-DOM-TypeDefs.html#t:XNode

 Constructors XText String ordinary text (leaf) XBlob Blob text represented more space efficient as bytestring (leaf)

How do you choose between two views when parsing? Forcing a user to use character lists does not seem to be a particularly attractive feature, especially if there is a lot of text content in XML documents.

Also, I found http://hackage.haskell.org/package/hxt-unicode on Google, but I'm not sure how it is intended to be used for parsing. Unicode support was also more explicit: http://hackage.haskell.org/packages/archive/hxt/8.5.2/doc/html/Text-XML-HXT-DOM-Unicode.html but this module was removed in the latest version (9.3.1.0 at the time of writing) for no clear reason. What was the motivation?

Can someone provide some sample code, as well as how HXT is intended for use, please? In this regard, wiki pages are seriously lacking. Thanks.

+7

xml unicode haskell hxt

fatuhoku Oct 05 '12 at 16:32

source share

1 answer

Michael snoyman · Accepted Answer · 2012-10-06T17:36:37+0000

The xml-conduit package uses the Text data type to store text data. This has become the standard representation of textual data over the past few years. xml-conduit is a well-preserved package, and I personally used it for a huge amount of both open and commercial code.

What is Haskell's attitude to Unicode in XML?

More articles: