Are Java String Objects an array of characters?

I am new to java and trying to understand the basics and the basics of the language.

Is it accurate to indicate that Java string objects are essentially a class defined as an immutable array of characters?

I ask for this as I am a bit confused by the specification compared to char arrays and string class ...

JLS 10.9

10.9 A character array is not a string In the Java programming language, unlike C, a char array is not a string, and neither a string nor an array of char ends with '\ u0000' (NUL character). The String object is immutable, that is, its contents never change, and the char array has mutable elements. The toCharArray method in the String class returns an array of characters containing the same sequence of characters as the string. The StringBuffer class implements useful methods on mutable character arrays.

JLS 4.3.3

4.3.3. String instances of a class of the String class represent Unicode code point sequences.

+8
java string object arrays chars
source share
1 answer

Can it be argued that Java string objects are essentially a class defined as an immutable array of characters?

Not. A Java String object (currently this is implementation information that I am collecting may vary) a class containing several fields:

  • A char[] containing the actual characters
  • Start index into array
  • Length
  • Hash cache calculated lazily

The reason for the index and length is that multiple lines may contain references to the same char[] . This is used by some operations, such as substring (in many implementations, anyway).

Important, however, is the API for the String , which is very different from the API for the array. This is an API that you might think about when you consider the JLS definition: a String represents a sequence of Unicode code points. So you can take a subsequence ( substring ), find a specific subsequence ( indexOf ), convert it to an uppercase sequence, etc.

In fact, JLS will be somewhat more accurate to call it a sequence of UTF-16 code blocks; it is entirely possible to build a string that is not a valid sequence of Unicode code points, for example. including one half of the “surrogate pair” of UTF-16 code units, but not the other. There are parts of the API that deal with String in terms of code units, but frankly, most developers spend most of their time processing strings, as if no BMP characters existed.

+17
source share

All Articles