Fragile and verbose code using the xml feed

Question

Fragile and verbose code using the xml feed

I built a GPX analyzer using XML conduit and there were problems with overly verbose and fragile code to identify elements and skip unnecessary tags.

Identification of elements (slight irritation)

I explicitly ignore the namespace by comparing only nameLocalName s. I assume the correct way is to hardcode the correct namespace into the program and have an auxiliary construction of my element names for comparison in tag* functions? This is a bit annoying as I have to support at least two different namespaces (GPX 1.1 and 1.0) which are pretty similar that they don't require code changes for my purposes.

Skipping items

GPX is quite large, and the set of user extensions is larger. Since the tool that I create needs limited information, I decided to ignore certain tags along with all their subelements. For instance:

 <trkpnt lat="45.19843" lon="-122.428"> <ele>4</ele> <time>...</time> <extensions> ... </extensions> </trkpnt>

To ignore extensions and similar tags with numerous subelements, I made a shell that will consume elements up to the final Event element:

 skipTagAndContents :: (MonadThrow m) => Text -> Sink Event m (Maybe ()) skipTagAndContents n = tagPredicate ((== n) . nameLocalName) ignoreAttrs (const $ many (skipElements n) >> return ()) skipElements t = do x <- await case x of Just (EventEndElement n) | nameLocalName n == t -> Done x Nothing Nothing -> Done x Nothing _ -> return (Just ())

There seems to be a tag* option that will do this for me (to succeed if all the children are not consumed), but the fact that I do not suggest that I skip a simple combinator, or should send a patch - which is it?

+4

xml haskell conduit

Thomas M. DuBuisson Jun 04 '12 at 0:45

source share

1 answer

Michael snoyman · Accepted Answer · 2012-06-04T03:44:12+0000

If you do not use namespaces at all, the easiest way is to simply remove them completely using something like Data.Conduit.List.map stripNamespace .

Honestly, I don't use the streaming interface very often; almost all my work is related to DOM ( Text.XML ) or cursor interfaces. Therefore, it is possible that there are no commandants. But in this case, I believe that you can greatly simplify the implementation, since tagPredicate does not allow the internal Sink to read beyond the end of the element. Therefore, you can rewrite skipTagAndContents as:

 tagPredicate ((== n) . nameLocalName) ignoreAttrs (const Data.Conduit.List.sinkNull)

You should verify that before just dropping it, I might incorrectly remember some details of the streaming interface.

Fragile and verbose code using the xml feed

More articles: