Unescaping characters in a string with Ruby

Given a string in the following format (Posterous API returns messages in this format):

s="\\u003Cp\\u003E"

How can I convert it to actual ascii characters so that s="<p>"?

I successfully used it on OSX Iconv.iconv('ascii', 'java', s), but after deploying it to Heroku, I got an exception Iconv::IllegalSequence. I assume that the Heroku system deployed does not support the encoder java.


I use HTTParty to make a request to the Posterous API. If I use curl to make the same request, I do not get double slashes.

On the httparty github page:

Automatically parse JSON and XML into ruby ​​hashes based on Content-Type response

The Posterous API returns JSON (without double slashes), and the HTTParty JSON partition inserts a double slash.


, HTTParty .

class Posterous
  include HTTParty
  base_uri "http://www.posterous.com/api/2"
  basic_auth "username", "password"
  format :json
  def get_posts
    response = Posterous.get("/users/me/sites/9876/posts&api_token=1234")
    # snip, see below...
  end
end

( , , site_id, api_token) .

snip response.body Ruby, JSON, response.parsed_response Ruby, HTTParty JSON Posterous API.

Unicode, \u003C, \\u003C.

+3
4

. json- , HTTParty (Crack gem) - Unicode, , Posterous A-F a-f, Crack . , .

HTTParty , ::JSON.parse Crack :

class JsonParser < HTTParty::Parser
  def json
    ::JSON.parse(body)
  end
end

class Posterous
   include HTTParty
   parser ::JsonParser

   #....
end
+1

. . elskwid , JSON:

s = ::JSON.parse("\\u003Cp\\u003E")

s = "<p>".

+3

pack:

"a\\u00e4\\u3042".gsub(/\\u(....)/){[$1.hex].pack("U")} # "aäあ"

:

"aäあ".gsub(/[^ -~\n]/){"\\u%04x"%$&.ord} # "a\\u00e4\\u3042"
+1

, .

There "\u003Cp\u003E"really is a line "<p>", only \u003Cis unicode for <and \003Eis >.

>> "\u003Cp\u003E"  #=> "<p>"

If you really get a double backslash string, you can try deleting one of the pair.

As a test, see how long the string is:

>> "\\u003Cp\\u003E".size #=> 13
>> "\u003Cp\u003E".size #=> 3
>> "<p>".size #=> 3

All of the above has been done using Ruby 1.9.2, which is Unicode. v1.8.7 was not. Here I use 1.8.7 IRB for comparison:

>> "\u003Cp\u003E" #=> "u003Cpu003E"
0
source

All Articles