Nice to find!
The short answer is completely arbitrary, and it depends on how Ruby internally constructs the returned strings.
There are a number of C internal functions that build empty strings or US-ASCII encoded literals: rb_usascii_str_new and the like. They are often used to build strings by adding small snippets of strings. Almost every to_s method does this:
[].to_s.encoding #<Encoding:US-ASCII> {}.to_s.encoding #<Encoding:US-ASCII> $/.to_s.encoding #<Encoding:US-ASCII> 1.to_s.encoding #<Encoding:US-ASCII> true.to_s.encoding #<Encoding:US-ASCII> Object.to_s.encoding #<Encoding:US-ASCII>
So why not Object.new.to_s ? The key here is that Object#to_s is a method of returning to_s for each class, so in order to make it general, but still informative, he encoded it to display the value of the object’s internal pointer. The easiest way to do this is with sprintf and the %p specifier. BUT whoever encoded Ruby sprintf wrapper rb_sprintf became lazy and just set the encoding to NULL , which returns to ASCII-8BIT . Therefore, usually everything that returns a formatted string will have the following encoding:
Object.new.to_s #<Encoding:ASCII-8BIT> nil.sort rescue $!.to_s.encoding #<Encoding:ASCII-8BIT> [].each.to_s.encoding #<Encoding:ASCII-8BIT>
As for the strings defined by the script, they get the default UTF-8 encoding, as you would expect.
source share