I am trying to parse an input string to determine if it contains any non-emojis.
I went through this wonderful article from Mathias and use both native punycode for encoding / decoding and regenerate for generating regular expressions. I also use EmojiData to get my emojis dictionary.
With all that said, some emoks continue to be annoying little buggers and refuse to comply. For some emoji, I keep getting a couple of code points.
// Example of a single code point: console.log(punycode.ucs2.decode('π©')); >> [ 128169 ] // Example of a paired code point: console.log(punycode.ucs2.decode('βοΈ')); >> [ 8987, 65039 ]
Matthias addresses this in his article (and gives an example of punycode working on this), but even using his example, I get the wrong answer:
function countSymbols(string) { return punycode.ucs2.decode(string).length; } console.log(countSymbols('π©')); >> 1 console.log(countSymbols('βοΈ')); >> 2
What is the best way to determine if a string contains all emojis or not? This is to prove the concept, so the decision can be as brute force as necessary.
--- UPDATE ---
A bit more context on my annoying emoji above.
They are visually identical, but are actually different Unicode values ββ(second from the above example):
β
The first works fine, the second does not. Unfortunately, the second version is what iOS seems to be using (if you copy and paste from iMessage, you get the second, and the same when you get text from Twilio).
thekevinscott
source share