How to convert a binary copy of an elixir to a string?

So, I am trying to convert binary code to string. This code:

t = [{<<71,0,69,0,84,0>>}] String.from_char_list(t) 

But I get this when I try this conversion:

 ** (ArgumentError) argument error (stdlib) :unicode.characters_to_binary([{<<70, 0, 73, 0, 78, 0>>}]) (elixir) lib/string.ex:1161: String.from_char_list/1 

I assume that <70, 0, etc. is probably a list of graphemes (this return from an API call and the API is not documented), but do I need to specify the encoding somehow?

I know that most likely I don’t have something obvious (maybe this is not the right function to use?), But I can’t figure out what to do here.


EDIT:

For what it's worth, the binary code above is the return value of an Erdang ODBC call. After digging a little more, I found that the binary in question is actually "Unicode binary encoded as UTF16 little endian" (see here: http://www.erlang.org/doc/apps/odbc/odbc.pdf p.9 re: SQL_WVARCHAR) This does not actually change the problem, but adds some context.

+11
source share
5 answers

There are a couple of things here:

1.) You have a list with a tuple containing one element, a binary file. You can probably just extract the binary and get your string. Passing the current data structure to_string will not work.

2.) The binary file that you used in your example contains 0 , a non-printable character. In the shell, this will not print properly as a string, because Elixir cannot distinguish between a binary and a binary representing a string when the binary representing a string contains non-printable characters.

3.) You can use pattern matching to convert a binary to a specific type. For instance:

 iex> raw = <<71,32,69,32,84,32>> ...> Enum.join(for <<c::utf8 <- raw>>, do: <<c::utf8>>) "GET " ...> <<c::utf8, _::binary>> = raw "G" 

Also, if you are getting binary data from a network connection, you probably want to use :erlang.iolist_to_binary , since the data will be iolist, not charlist. The difference is that iolists can contain binary files, nested lists, and also just be a list of integers. Charlists are always just a list of integers. If you are to_string for iolist, this will fail.

+19
source

I’m not sure that the OP has solved his problem since then, but in connection with his remark that his binary is utf16-le : I found out that it’s the fastest for this encoding (and for those who are more experienced with Elixir probably Enum.reduce ) way - use Enum.reduce :

 # coercing it into utf8 gives us ["D", <<0>>, "e", <<0>>, "v", <<0>>, "a", <<0>>, "s", <<0>>, "t", <<0>>, "a", <<0>>, "t", <<0>>, "o", <<0>>, "r", <<0>>] <<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0>> |> String.codepoints() |> Enum.reduce("", fn(codepoint, result) -> << parsed :: 8>> = codepoint if parsed == 0, do: result, else: result <> <<parsed>> end) # "Devastator" |> IO.puts() 

Assumptions:

  • utf16-le

  • code points are backward compatible with utf8 i.e. they use only 1 byte

Since I am still studying elixir, it took me a while to find this solution. I looked at other libraries created by people, even using something like iconv at the bash level.

+4
source

I made a function to convert binary code to string

 def raw_binary_to_string(raw) do codepoints = String.codepoints(raw) val = Enum.reduce(codepoints, fn(w, result) -> cond do String.valid?(w) -> result <> w true -> << parsed :: 8>> = w result <> << parsed :: utf8 >> end end) end 

Done on the iex console

 iex(6)>raw=<<65, 241, 111, 32, 100, 101, 32, 70, 97, 99, 116, 117, 114, 97, 99, 105, 111, 110, 32, 65, 99, 116, 117, 97, 108>> iex(6)>raw_binary_to_string(raw) iex(6)>"AΓ±o de Facturacion Actual" 
+3
source

The last point, defined , makes a change to the problem and explains it. Elixir uses binary files as strings, but assumes and requires that they be encoded in UTF8, not UTF16.

+2
source

Link to http://erlang.org/pipermail/erlang-questions/2010-De December / 054885.html

You can use :unicode.characters_to_list(binary_string, {:utf16, :little}) to check the result and save too

IEX eval

 iex(1)> y <<115, 0, 121, 0, 115, 0>> iex(2)> :unicode.characters_to_list(y, {:utf16, :little}) 'sys' 

Note : the value is printed as sys for <<115, 0, 121, 0, 115, 0>>

+1
source

All Articles