I am trying to replace emoji with arabic tweets using java.
I used this code:
String line = "اييه تقولي اجل الارسنال تعادل امس بعد ما كان فايز 😂😂";
Pattern unicodeOutliers = Pattern.compile("([\u1F601-\u1F64F])", Pattern.UNICODE_CASE | Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE);
Matcher unicodeOutlierMatcher = unicodeOutliers.matcher(line);
line = unicodeOutlierMatcher.replaceAll(" $1 ");
But he does not replace them. Even if I only match the character "\ u1F602" itself, it does not replace it. Maybe because it's 5 digits after u ?! I'm not sure, just an assumption.
Note that:
1- the emotion at the end of the tweet (😂) is “U + 1F602”, which is “with tears of joy”
2- this question is not a duplicate of this question .
Any ideas?
source
share