Haskell: remove html symbol objects in a string

I want to take a string containing html character entities such as   etc., and replace them with literal string characters. I get data via twitter api and the text contains these objects. Does anyone know of an existing library that does this?

Thank you for your help!

+4
source share
3 answers

The Web.Encodings package on hackage looks promising (decodeHtml function):

http://hackage.haskell.org/packages/archive/web-encodings/0.3.0.2/doc/html/Web-Encodings.html

+3
source

I built the following function with functions from the tagsoup package. It processes all named and numerical objects from the HTML5 standard (more than 2000, see the List ).

 import qualified Text.HTML.TagSoup as TS decodeHTMLentities :: (StringLike str, Show str) => str -> str decodeHTMLentities s = TS.fromTagText $ head $ TS.parseTags s 

StringLike has instances for String , Lazy and Strict ByteString and Text .

Unknown rights will not be saved. If you want a warning about unknown objects to use:

 > parseTagsOptions parseOptions{optTagWarning=True} "&asdasd;" [TagText "&asdasd;",TagWarning "Unknown entity: asdasd"] 
+2
source

Hello, try the code below, it will work

 labelTR = labelTR.replace(/(?: |")/g,''); 
-6
source

All Articles