How can I get the Unicode character code?

Let's say I have this:

char registered = '®'; 

or umlaut , or any Unicode character. How can I get his code?

+50
java unicode character
Jan 05 '10 at 14:18
source share
5 answers

Just convert it to int :

 char registered = '®'; int code = (int) registered; 

There is actually an implicit conversion from char to int , so you do not need to specify it explicitly, as I did above, but I would do it in this case so that it is obvious what you are trying to do.

This will give a UTF-16 code that matches the Unicode code point for any character defined on the base multilingual plane. (And only BMP characters can be represented as char values ​​in Java.) As Andrzej Doyle says, if you want Unicode code from an arbitrary string, use Character.codePointAt() .

Once you get the UTF-16 code code or Unicode code codes, but of which are integers, it is up to you what you do with them. If you need a string representation, you need to determine exactly which representation you want. (For example, if you know that the value will always be in BMP, you may need a fixed four-digit hexadecimal representation with the prefix U+ , for example, "U+0020" for a space.) However, this is beyond the scope of this question, since we do not know what the requirements are .

+84
Jan 05 '10 at 14:20
source share

A more complete, albeit more detailed, way to do this would be to use the Character.codePointAt method. This will handle high surrogate characters that cannot be represented as a single integer within the range that char can represent.

The example below is not strictly required - if the (Unicode) character can fit inside a single (Java) char (for example, a registered local variable), it should be within \u0000 to \uffff , and you don’t need to worry about surrogate pairs. But if you look at the potentially higher points of the code from the String / char array, calling this method would be wise to cover edge cases.

For example, instead of

 String input = ...; char fifthChar = input.charAt(4); int codePoint = (int)fifthChar; 

using

 String input = ...; int codePoint = Character.codePointAt(input, 4); 

In this case, this code is slightly smaller, but it will handle surrogate pair detection for you.

+29
Jan 05 '10 at 14:25
source share

In Java, char is technically a “16-bit integer,” so you can just pass it to an int and you will get the code. From Oracle :

The char data type is a single 16-bit Unicode character. It has a minimum value of '\ u0000' (or 0) and a maximum value of '\ uffff' (or 65,535 inclusive).

So you can just port it to int.

 char registered = '®'; System.out.println(String.format("This is an int-code: %d", (int) registered)); System.out.println(String.format("And this is an hexa code: %x", (int) registered)); 
+4
Apr 15 '13 at 19:16
source share

Dear friend, John Skeet said that you can find the Decimal character, but it is not a Hex code character, as it should be mentioned in unicode, so you must represent character codes through HexCode not in Deciaml.

There is an open source tool at http://unicode.codeplex.com that provides complete information about characer or sentece.

so it’s better to create a parser that gives char as a parameter and returns ahexCode as a string

 public static String GetHexCode(char character) { return String.format("{0:X4}", GetDecimal(character)); }//end 

hope this helps

0
Jan 06 '10 at
source share

For me, only "Integer.toHexString (registered)" worked the way I wanted:

 char registered = '®'; System.out.println("Answer:"+Integer.toHexString(registered)); 

This answer will give you only the string representations that are usually presented in tables. John Skeet's answer explains more.

0
Jul 21 '15 at 12:00
source share



All Articles