Text or Bytestring

Question

Text or Bytestring

Good afternoon.

The only thing I hate about Haskell is the number of packages to work with the string.

At first I used my own Haskell [Char] strings, but when I tried to start using the hackage libraries, I was completely lost in endless conversions. It seems that each package uses an implementation of different strings, some of them accept their own handmade thing.

Then I rewrote my code using the Data.Text lines and the OverloadedStrings extension, I selected Text because it has a wider range of functions, but it seems that many projects prefer ByteString .
Someone can give a short explanation why to use this or that?

PS: btw, how to convert from Text to ByteString ?

Cannot match expected type Data.ByteString.Lazy.Internal.ByteString versus expected type Text Expected type: IO Data.ByteString.Lazy.Internal.ByteString Derived type: IO Text

I tried encodeUtf8 from Data.Text.Encoding but no luck:

Cannot match the expected type of Data.ByteString.Lazy.Internal.ByteString versus the output type Data.ByteString.Internal.ByteString

UPD:

Thanks for the answers that * Chunks goodness looks like a way to go, but I'm somewhat shocked by the result, my original function looked like this:

 htmlToItems :: Text -> [Item] htmlToItems = getItems . parseTags . convertFuzzy Discard "CP1251" "UTF8"

And now it has become:

 htmlToItems :: Text -> [Item] htmlToItems = getItems . parseTags . fromLazyBS . convertFuzzy Discard "CP1251" "UTF8" . toLazyBS where toLazyBS t = fromChunks [encodeUtf8 t] fromLazyBS t = decodeUtf8 $ intercalate "" $ toChunks t

And yes, this function does not work, because it is incorrect if we supply it with Text , then we are sure that this text is correctly encoded and ready for use, and the conversion is a stupid thing, but such a detailed conversion should still occur somewhere- then outside htmltoItems .

+68

string text haskell

Dfr Sep 09 '11 at 6:13

source share

3 answers

You definitely want to use Data.Text for text data.

encodeUtf8 is the path. This error:

Failed to match the expected type Data.ByteString.Lazy.Internal.ByteString against the output type Data.ByteString.Internal.ByteString

means that you provide a strict byte string for code that expects a lazy byte string. Converting easily using the fromChunks function:

 Data.ByteString.Lazy.fromChunks :: [Data.ByteString.Internal.ByteString] -> ByteString

so all you have to do is add the fromChunks [myStrictByteString] function wherever lazy bytes are expected.

Converting another path can be done using the double toChunks function, which takes a lazy byte string and gives a list of strict fragments.

You might want to ask the maintainers of some packages if they can provide a text interface instead of or in addition to the bytestring interface.

+21

John L Sep 09 '11 at 7:52

source share

Use one cs function from Data.String.Conversions .

This will allow you to convert between String , ByteString and Text (as well as ByteString.Lazy and Text.Lazy ) depending on the input and expected types.

You still have to call it, but you no longer have to worry about the corresponding types.

See this answer for an example use.

+5

Titou Dec 12 '14 at 2:56

source share

shang · Accepted Answer · 2011-09-09 07:35

ByteStrings are mostly useful for binary data, but they are also an efficient way to process text if all you need is an ASCII character set. If you need to handle Unicode strings, you need to use Text . However, I must emphasize that none of them is a substitute for the other, they are usually used for different things: although Text represents pure unicode, you still need to code to and from the ByteString binary representation whenever you, for example. transport text through a socket or file.

Here's a good article on Unicode basics that does a decent job of explaining the relationship of Unicode code points ( Text ) and encoded binary bytes ( ByteString ): Absolute Minimum Every software developer Absolutely, should be positive about Unicode and character sets

You can use the Data.Text.Encoding module to convert between two data types or Data.Text.Lazy.Encoding if you are using lazy options (as you seem to do based on error messages).

Text or Bytestring

More articles: