The real reason is that indexOf(int) expects Unicode code, not a 16-bit UTF-16 character. Unicode code points actually have a length of up to 21 bits.
(The representation of UTF-16 of the longer code point is actually two 16-bit character values. These values ββare known as leading and ending surrogates, D800 16 for DBFF 16 and DC00 16 in DFFF 16, respectively, see Unicode FAQ - UTF- 8, UTF -16, UTF-32 and specification for details.)
If you give indexOf(int) code point> 65535, it will look for a pair of UTF-16 characters that encode the code.
This is evidenced by javadoc (although not very clearly), and code analysis indicates that this is indeed a way of implementation.
Why not just use 16-bit characters?
This is pretty obvious. If they did, there would be no easy way to find code points greater than 65535 in Strings. This would be a serious inconvenience for people who develop internationalized applications where the text may contain such code points. (Many supposedly internationalized applications make the wrong assumption that a char is a code point. Often this does not matter, but sometimes it happens.)
But that should not matter to you. The method will work if your lines consist only of 16-bit codes ... or, for that matter, only ASCII codes.
Stephen c
source share