I am testing migration from Delphi 5 to XE. Not familiar with UnicodeString, before asking my question, I would like to introduce its background.
Delphi XE String Functions: Copy , Delete, and Paste There is an Index parameter that indicates where to start work. An index can have any integer value, starting from 1 and ending with the length of the string to which the function is applied. Since a string can contain multi-element characters, a function operation can begin with an element (surrogate) belonging to a multi-element series encoding a single Unicode with a code code. Then, having a reasonable string and using one of the functions, we can get an unreasonable result.
The phenomenon can be illustrated by the following examples, using the Copy function with respect to strings representing the same array of named code points (i.e. significant characters)
($61, $13000, $63)
This is the concatenation of 'a' , EGYPTIAN_HIEROGLYPH_A001 and 'c' ; he looks like

Case 1. Copy of AnsiString (element = byte)
Let's start with the aforementioned UnicodeString #$61#$13000#$63 and convert it to AnsiString s0 encoding with UTF-8 encoding.
Then we check the function
copy (s0, index, 1)
for all possible index values; there are 6 of them, since s0 has a length of 6 bytes.
procedure Copy_Utf8Test; type TAnsiStringUtf8 = type AnsiString (CP_UTF8); var ss : string; s0,s1 : TAnsiStringUtf8; ii : integer; begin ss := #$61#$13000#$63; //mem dump of ss: $61 $00 $0C $D8 $00 $DC $63 $00 s0 := ss; //mem dump of s0: $61 $F0 $93 $80 $80 $63 ii := length(s0); //sets ii=6 (bytes) s1 := copy(s0,1,1); //'a' s1 := copy(s0,2,1); //#$F0 F means "start of 4-byte series"; no corresponding named code-point s1 := copy(s0,3,1); //#$93 "trailing in multi-byte series"; no corresponding named code-point s1 := copy(s0,4,1); //#$80 "trailing in multi-byte series"; no corresponding named code-point s1 := copy(s0,5,1); //#$80 "trailing in multi-byte series"; no corresponding named code-point s1 := copy(s0,6,1); //'c' end;
The first and last results are reasonable in the UTF-8 codepage, while the other 4 are not.
Case 2. Copy of UnicodeString (element = word)
Let's start with the same UnicodeString s0 := #$61#$13000#$63 .
Then we check the function
copy (s0, index, 1)
for all possible index values; there are 4 of them, since s0 has a length of 4 words.
procedure Copy_Utf16Test; var s0,s1 : string; ii : integer; begin s0 := #$61#$13000#$63; //mem dump of s0: $61 $00 $0C $D8 $00 $DC $63 $00 ii := length(s0); //sets ii=4 (bytes) s1 := copy(s0,1,1); //'a' s1 := copy(s0,2,1); //#$D80C surrogate pair member; no corresponding named code-point s1 := copy(s0,3,1); //#$DC00 surrogate pair member; no corresponding named code-point s1 := copy(s0,4,1); //'c' end;
The first and last results are reasonable within the code page CP_UNICODE (1200), while the other 2 are not.
Conclusion
String oriented functions: Copy , Delete and Paste work fine in a string, considered as a simple array of bytes or words. But they are not useful if the string is considered as what it essentially is, i.e. A representation of an array of named code points.
Both of the above two cases deal with strings that represent the same array of 3 named code points. They are considered as representations (encodings) of the same text consisting of 3 significant characters (in order to avoid abuse of the term "characters").
You may want to extract (copy) any of the significant characters, regardless of whether a particular textual representation (coding) is mono or multi-element. I spent quite a bit of time searching for a satisfactory copy equivalent, which I used in Delphi 5.
Question. Do such equivalents exist, or should I write them myself?