How to convert special characters to string in unicode?

I could not find the answer to this problem, having tried several answers here to find something that works, but to no avail. The application I'm working in uses a username to create a PDF with that name in it. However, when the name someones contains a special character, such as "Yağmur", the pdf creator freaks out and skips this special character. However, when it receives the unicode ( "Yağmur") equivalent , it prints "Yağmur"to pdf as it should.

How to check the name / string for any special character (regex = "[^a-z0-9 ]"), and when it is found, replace that character with its Unicode equivalent and return a new line without code?

+4
source share
2 answers

I will try to give a solution in a general way, since the work with frames that you use is not mentioned as part of your statement about the problem.

I too have come across a similar question for too long. This needs to be handled using the pdf program if you set the text / char encoding to UTF-8. See how you can set the encoding in your structure to generate PDF and try it. Hope this helps!

+1
source

One hacker way to do this would be:

/*
 * TODO: poorly named 
 */ 
public static String convertUnicodePoints(String input) {
    // getting char array from input
    char[] chars =  input.toCharArray();
    // initializing output
    StringBuilder sb = new StringBuilder();
    // iterating input chars
    for (int i = 0; i < input.length(); i++) {
        // checking character code point to infer whether "conversion" is required
        // here, picking an arbitrary code point 125 as boundary
        if (Character.codePointAt(input, i) < 125) {
            sb.append(chars[i]);
        }
        // need to "convert", code point > boundary
        else {
            // for hex representation: prepends as many 0s as required
            // to get a hex string of the char code point, 4 characters long
            // sb.append(String.format("&#xu%04X;", (int)chars[i]));

            // for decimal representation, which is what you want here
            sb.append(String.format("&#%d;", (int)chars[i]));
        }
    }
    return sb.toString();
}

If you do: System.out.println(convertUnicodePoints("Yağmur"));...

... you get: Ya&#287;mur.

Of course, you can play with the "conversion" logic and decide which ranges are converted.

0
source

All Articles