Why parameter for string.indexOf method is int in Java

I am wondering why the indexOf parameter is an int method when the description says char.

public int indexOf (int ch)

Returns the index within this string of the first occurrence of the specified **character** 

http://download.oracle.com/javase/1,5.0/docs/api/java/lang/String.html#indexOf%28int%29

 Also, both of these compiles fine: char c = 'p'; str.indexOf(2147483647); str.indexOf(c); 

a] Basically, I got confused that int in java is 32bit and Unicode characters are 16 bits.

b] Why not use the character itself, and not use int. Is this any performance optimization ?. Are characters more complex to represent than int? How?

I guess this should be a simple argument for this, and it makes me know more about it!

Thanks!

+7
source share
4 answers

The real reason is that indexOf(int) expects Unicode code, not a 16-bit UTF-16 character. Unicode code points actually have a length of up to 21 bits.

(The representation of UTF-16 of the longer code point is actually two 16-bit character values. These values ​​are known as leading and ending surrogates, D800 16 for DBFF 16 and DC00 16 in DFFF 16, respectively, see Unicode FAQ - UTF- 8, UTF -16, UTF-32 and specification for details.)

If you give indexOf(int) code point> 65535, it will look for a pair of UTF-16 characters that encode the code.

This is evidenced by javadoc (although not very clearly), and code analysis indicates that this is indeed a way of implementation.


Why not just use 16-bit characters?

This is pretty obvious. If they did, there would be no easy way to find code points greater than 65535 in Strings. This would be a serious inconvenience for people who develop internationalized applications where the text may contain such code points. (Many supposedly internationalized applications make the wrong assumption that a char is a code point. Often this does not matter, but sometimes it happens.)

But that should not matter to you. The method will work if your lines consist only of 16-bit codes ... or, for that matter, only ASCII codes.

+12
source

Characters in Java are stored in their integer representation in unicode format. The Character documentation provides more details on this format.

In the docs on this page:

Methods that accept an int value support all Unicode characters, including optional characters. For example, Character.isLetter (0x2F81A) returns true because the code point value is a letter (ideologist CJK).

+3
source

The str.indexOf(int) method accepts an int. If you pass a char to it, java will cast the char to int since char is a 16-bit number.

0
source

Java has a number of implicit typecasting rules that run under the hood. For primitives, there are special rules described in the Conversions and Promotions document, part of the Sun Java documentation. For your specific question, converting int to char is a "narrowing primitive conversion". See Section 5.1.3 in the above document.

This, as they say, is a common programming practice for exchanging small positive integers and characters that are encoded as integers. This is because their use is indistinguishable from use in C when ASCII was all that existed.

0
source

All Articles