Replacing Emoji Unicode Range from Arabic Tweets Using Java

I am trying to replace emoji with arabic tweets using java.

I used this code:

String line = "اييه تقولي اجل الارسنال تعادل امس بعد ما كان فايز 😂😂";
Pattern unicodeOutliers = Pattern.compile("([\u1F601-\u1F64F])", Pattern.UNICODE_CASE | Pattern.CANON_EQ | Pattern.CASE_INSENSITIVE);
Matcher unicodeOutlierMatcher = unicodeOutliers.matcher(line);
line = unicodeOutlierMatcher.replaceAll(" $1 ");

But he does not replace them. Even if I only match the character "\ u1F602" itself, it does not replace it. Maybe because it's 5 digits after u ?! I'm not sure, just an assumption.

Note that:

1- the emotion at the end of the tweet (😂) is “U + 1F602”, which is “with tears of joy”

2- this question is not a duplicate of this question .

Any ideas?

+4
source share
2 answers

Java 5 and 6

Java 5 6 JVM U + 1F601 U + 1F64F, :

Pattern emoticons = Pattern.compile("[\uD83D\uDE01-\uD83D\uDE4F]");

Java 7 , Sun/Oracle, Pattern.compile(), String, , .

Java 7

  • \x{...} , Java 7.

  • , , Unicode Emoticons, U + 1F600 ( U + 1F601) U + 1F64F.

    Pattern emoticons = Pattern.compile("\\p{InEmoticons}");
    

    ​​ Java 7, Java 7.

  • , , escape . , Java 7 , regex , .

    Pattern emoticons = Pattern.compile("[\\uD83D\\uDE01-\\uD83D\\uDE4F]");
    

    /!\

    , , :

    • "[\\uD83D\uDE01-\\uD83D\\uDE4F]"

    • "[\uD83D\\uDE01-\\uD83D\\uDE4F]"

    U + D83D U + DE01 U + 1F64F Oracle.

Java 5 6 Oracle Pattern.u() , "\\uD83D\\uDE01". 2 , -.

+5

Javadoc Pattern

( ) \x{...}, , U + 2011F \x{2011F} Unicode escape- \uD840\uDD1F.

, , , ([\x{1F601}-\x{1F64F}]). , Java String, .

Pattern unicodeOutliers = Pattern.compile("([\\x{1F601}-\\x{1F64F}])");

, \x{...} Java 7.

+4

All Articles