The difference in string length between rubies 1.8 and 1.9

I have a website that runs on ruby ​​1.8.7. I have a check on an incoming message that checks that we allow up to a maximum of 12,000 characters. Spaces are considered characters, and the tab and carriage return are deleted before the message is validated.

Here is a post that is being tested http://pastie.org/5047582

In ruby ​​1.9, the string length is displayed as 11909, which is correct. But when I check the length on the ruby ​​1.8.7, it turns out 12044.

I used codepad.org to run this ruby ​​code, which gives me http://codepad.org/OxgSuKGZ (which gives the length as 12044, which is wrong), but when I run the same code in the console on codeacademy.org, the length line is 11909.

Can someone explain to me why this is happening?

thanks

+6
source share
1 answer

This is a Unicode problem. The string you use contains characters outside the ASCII range, and the commonly used UTF-8 encoding encodes them as 2 (or more) bytes.

Ruby 1.8 did not handle Unicode correctly, and length simply indicates the number of bytes in the string, which leads to funny things like:

 "Δ…".length => 2 

Ruby 1.9 has improved Unicode processing. This includes length returns the actual number of characters in the string if Ruby knows the encoding:

 "Γ€".length => 1 

One possible Ruby 1.8 solution uses regular expressions that can be done in Unicode:

 "Δ…".scan(/./mu).size => 1 
+11
source

Source: https://habr.com/ru/post/927604/


All Articles