My understanding of .chars is that it returns the number of characters per line in graphemes . My understanding of .ords is that it returns a "list of code numbers, one for the base character of each grapheme in a string . " That is .chars returns the number of graphemes, and .ords returns one code (base) per grapheme. However, the behavior that I observe in Rakudo 2016.07.1 on MoarVM 2016.07 does not seem to match this:
> "\x[2764]\x[fe0e]".chars 1 > "\x[2764]\x[fe0e]".ords.fmt("U+%04x") U+2764 U+fe0e > "e\x[301]".ords.fmt("U+%04x") U+00e9 > "0\x[301]".ords.fmt("U+%04x") U+0030
The .chars method returns .chars 1 for HEAVY BLACK HEART and VARIATION SELECTOR-15 (text representation , not emoji β€οΈ, U + 2764 U + fe0f), but then .ords returns both code points than just the base (I expected only U + 2764). Even more confusing, if you call .ords on LATIN SMALL LETTER E and COMBINING ACUTE ACCENT, you will return U + 00e9 (LATIN SMALL LETTER E WITH ACUTE). I was expecting U + 0065, since LATIN SMALL LETTER E is the base code. I will return the expected result when there is no version of the NFC string (for example, U + 0030 for 0).
Is my understanding of .chars and .ords just wrong, or is it a mistake?
unicode perl6
Chas. Owens
source share