This is a problem with the Unicode equivalent .
The version of your string a consists of the character Ζ° (U + 01B0: LATIN SMALL LETTER U WITH HORN) and then U + 0303 COMBINING TILDE. This second character, as the name implies, is a combination of a character that, when rendered, is combined with the previous character to create the final glyph.
The version of string b uses the character α»― (U + 1EEF, LATIN SMALL LETTER U WITH HORN AND TILDE), which is the only character and is equivalent to the previous combination, but uses a different byte to represent it.
To compare these strings, you need to normalize them so that they use the same byte sequences for these character types. Current versions of Ruby have a built-in (in earlier versions you had to use a third-party library).
So, now you have
a == b
which is false but if you do
a.unicode_normalize == b.unicode_normalize
you should get true .
If you are using an older version of Ruby, there are several options. Rails has a normalize method as part of its multi-byte support, so if you use Rails you can do:
a.mb_chars.normalize == b.mb_chars.normalize
or maybe something like:
ActiveSupport::Multibyte::Unicode.normalize(a) == ActiveSupport::Multibyte::Unicode.normalize(b)
If you are not using Rails, you can look at the unicode_utils gem and do something like this:
UnicodeUtils.nfkc(a) == UnicodeUtils.nfkc(b)
( nfkc refers to the normalization form; this is the same as the default value in other methods.)
There are various ways to normalize Unicode strings (i.e. whether you use decomposed or combined versions), and this example just uses the default value. I am leaving the study of differences to you.