Update for Swift 4 (Xcode 9)
As with Swift 4 (tested using Xcode 9 beta), Sequence Emoji ZWJ is treated as one Character , as specified in the Unicode 9 standard:
let str = "π¨βπ¨βπ§βπ§π" print(str.count) // 2 print(Array(str)) // ["π¨βπ¨βπ§βπ§", "π"]
String is also its character set (again), so we can call str.count to get the length, and Array(str) to get all the characters as an array.
(Old answer for Swift 3 and earlier)
This is only a partial answer that may help in this particular case.
"π¨π¨π§π§" really is a combination of four separate characters:
let str = "π¨βπ¨βπ§βπ§π" // print(Array(str.characters)) // Output: ["π¨β", "π¨β", "π§β", "π§", "π"]
which are glued together with U + 200D (ZERO WIDTH JOINER):
for c in str.unicodeScalars { print(String(c.value, radix: 16)) }
Listing a string using the .ByComposedCharacterSequences option correctly combines these characters:
var chars : [String] = [] str.enumerateSubstringsInRange(str.characters.indices, options: .ByComposedCharacterSequences) { (substring, _, _, _) -> () in chars.append(substring!) } print(chars) // Output: ["π¨βπ¨βπ§βπ§", "π"]
But there are other cases when this does not work, for example, "flags", which are a sequence of "Regional indicator" symbols "(cf. Swift countElements () returns an invalid value for the count emoji flags ).
let str = "π©πͺ"
the result of the above loop is
["π©", "πͺ"]
which is not the desired result.
Complete rules are defined in β3 Graphic Cluster Bordersβ in βUNICODE TEXT SEGMENTATION Standard Application No. 29β in Unicode Standard.