Can it be argued that Java string objects are essentially a class defined as an immutable array of characters?
Not. A Java String object (currently this is implementation information that I am collecting may vary) a class containing several fields:
- A
char[] containing the actual characters - Start index into array
- Length
- Hash cache calculated lazily
The reason for the index and length is that multiple lines may contain references to the same char[] . This is used by some operations, such as substring (in many implementations, anyway).
Important, however, is the API for the String , which is very different from the API for the array. This is an API that you might think about when you consider the JLS definition: a String represents a sequence of Unicode code points. So you can take a subsequence ( substring ), find a specific subsequence ( indexOf ), convert it to an uppercase sequence, etc.
In fact, JLS will be somewhat more accurate to call it a sequence of UTF-16 code blocks; it is entirely possible to build a string that is not a valid sequence of Unicode code points, for example. including one half of the “surrogate pair” of UTF-16 code units, but not the other. There are parts of the API that deal with String in terms of code units, but frankly, most developers spend most of their time processing strings, as if no BMP characters existed.
Jon skeet
source share