How to normalize Unicode digits in Java

Is there any Java API to normalize Unicode digits to ASCII digits?

There is a normalization API in JDK and ICU4J , which does not seem to be able to handle such normalization (since it is probably not called Unicode standard normalization )

I need to convert all Unicode digit forms ( indicated in this post ) to [0-9]. A possible messy solution - 10 replaceable - all for any digit from 0 to 9.

+4
source share
2 answers

OK, no answer. Here is a useless solution:

static final Pattern DIGIT_0 = Pattern.compile("[٠۰߀०০੦૦୦௦౦೦൦๐໐0]");
static final Pattern DIGIT_1 = Pattern.compile("[١۱߁१১੧૧୧௧౧೧൧๑໑1]");
static final Pattern DIGIT_2 = Pattern.compile("[٢۲߂२২੨૨୨௨౨೨൨๒໒2]");
static final Pattern DIGIT_3 = Pattern.compile("[٣۳߃३৩੩૩୩௩౩೩൩๓໓3]");
static final Pattern DIGIT_4 = Pattern.compile("[٤۴߄४৪੪૪୪௪౪೪൪๔໔4]");
static final Pattern DIGIT_5 = Pattern.compile("[٥۵߅५৫੫૫୫௫౫೫൫๕໕5]");
static final Pattern DIGIT_6 = Pattern.compile("[٦۶߆६৬੬૬୬௬౬೬൬๖໖6]");
static final Pattern DIGIT_7 = Pattern.compile("[٧۷߇७৭੭૭୭௭౭೭൭๗໗7]");
static final Pattern DIGIT_8 = Pattern.compile("[٨۸߈८৮੮૮୮௮౮೮൮๘໘8]");
static final Pattern DIGIT_9 = Pattern.compile("[٩۹߉९৯੯૯୯௯౯೯൯๙໙9  ]");

public static final Pattern[] DIGIT_PATTERN_LIST = { DIGIT_0, DIGIT_1, DIGIT_2, DIGIT_3, DIGIT_4, DIGIT_5, DIGIT_6, DIGIT_7, DIGIT_8,
        DIGIT_9 };

/**
 * Converts any Unicode digits into their ASCII equivalent. For example given 23۹٤۴ returns 23944
 * 
 * @param str
 * @return
 */
public static String normalizeUnicodeDigits(String str) {
    for (int i = 0; i < DIGIT_PATTERN_LIST.length; i++) {
        Pattern dp = DIGIT_PATTERN_LIST[i];
        str = dp.matcher(str).replaceAll(String.valueOf(i));
    }
    return str;
}
+1

, , .

0

All Articles