Trying to understand the Ruby.chr and .ord methods

I recently worked with Ruby chr and ord methods, and there are a few things that I don't understand.

My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character of type "A" and I call ord on it, I get its position in the ASCII table, which is 65. Calling the opposite, 65.chr gives me the value of the character "A", so this tells me that Ruby has a collection somewhere of the ordered values ​​of a character, and she can use this collection to give me the position of a specific character or character in a specific position. Maybe I'm wrong, please correct me if I will.

Now I also understand that the default character encoding of Ruby uses UTF-8, so it can work with thousands of possible characters. Thus, if I ask about it something like this:

 'ε₯½'.ord 

I get the position of this character, which is 22909. However, if I call chr on this value:

 22909.chr 

I get "RangeError: 22909 from a char range". I can get char to work with values ​​up to 255 that are ASCII extended. So my questions are:

  • Why does Ruby seem to get the values ​​for chr from the extended ASCII character set, but ord from UTF-8?
  • Is there any way to tell Ruby to use different encodings when using these methods? For example, tell me to use ASCII-8BIT encoding instead of what it defaults to?
  • If you can change the default encoding, is there a way to get the total number of characters available in the set used?
+7
ruby encoding
source share
2 answers

According to Integer#chr you can use the following to force the encoding to be UTF_8.

 22909.chr(Encoding::UTF_8) #=> "ε₯½" 

To list all available encoding names

 Encoding.name_list #=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...] 

Hacker way to get the maximum number of characters

 2000000.times.reduce(0) do |x, i| begin i.chr(Encoding::UTF_8) x += 1 rescue end x end #=> 1112064 
+4
source share

After working with this for some time, I realized that I can get the maximum number of characters for each encoding by running a binary search to find the largest value that RangeError does not raise.

 def get_highest_value(set) max = 10000000000 min = 0 guess = 5000000000 while true begin guess.chr(set) if (min > max) return max else min = guess + 1 guess = (max + min) / 2 end rescue if min > max return max else max = guess - 1 guess = (max + min) / 2 end end end end 

The value entered into the method is the name of the encoding being checked.

0
source share

All Articles