Delphi 2009 + Unicode + Char -size

Question

Delphi 2009 + Unicode + Char -size

I just got Delphi 2009 and previously read some articles about the changes that might be required due to switching to Unicode strings. It is mostly mentioned that sizeof (char) is not guaranteed to be greater than 1. But why would this be interesting with respect to string manipulations?

For example, if I use AnsiString: = 'Test' and do the same with String (now it is unicode), then I get Length () = 4, which is correct for both cases. Without testing it, I am sure that all other string manipulation functions behave the same and solve internally if the argument is a unicode string or something else.

Why is the actual char size interesting to me if I do string manipulations? (Of course, if I use strings as strings, and not to store other data)

Thanks for any help! Holger

+6

unicode delphi delphi-2009

Holgerwa Sep 24 '08 at 8:33

source share

7 answers

People often implicitly convert from characters to bytes into old Delphi code without even thinking about it. For example, when writing to a stream. When you write a string to a stream, you must specify the number of bytes you write, but people often miss the number of characters. See this post from Chris Bensen for another example.

Another way people often do this implicit conversion, and older code, is by using a "string" to store binary data. In this case, they really want bytes, but the data type expects characters. D2009 has the best type for this .

+4

Craig stuntz Sep 24 '08 at 12:26

source share

I have not tried Delphi 2009, but uses fpc, which also slowly switches to unicode. I am 95% sure that everything below also applies to Delphi 2009

In fpc (with Unicode support), it will be such that length-type functions take into account the code page. Thus, he will return the length of the string, as a "person", he will see it. If there are, for example, two Chinese characters, both take two bytes of memory in Unicode, the length will return 2, since there are two characters in the string. But the string will occupy 4 bytes of memory. (+ memory for reference counter and lead # 0, but aside)

What you can no longer do is the following:

var p : pchar; begin p := s[1]; for i := 0 to length(string)-1 do begin write(p); inc(p); end; end;

Because this code will - in two examples with a Chinese character - write the wrong two characters. Namely, two bytes that are part of the first "real" character.

In short: Length () no longer returns the number of bytes allocated for the string, but the number of characters. (Before moving to unicode, these two values were equal to each other)

+1

Loesje Sep 24 '08 at 8:48

source share

The actual character size should not matter if you are not doing manipulation at the byte level.

0

1800 INFORMATION Sep 24 '08 at 8:43

source share

(Of course, if I use strings as strings, and not to store other data)

What is the key point, you do not use strings for other purposes, but some people do. They use strings just like arrays, so they (and this including me) will need to check all such uses to make sure nothing is broken ...

0

rik Sep 24 '08 at 8:45

source share

Do not forget that there are times when this conversion is really not required. Say for storing a GUID in a record, for example. A guide can contain only hexadecimal characters, as well as brackets ... forcing them to take up double space can significantly affect the existing code. Of course, a simple solution is to change them to AnsiString and deal with compiler warnings if you perform any string manipulations.

0

skamradt Sep 24 '08 at 13:11

source share

This can be a problem if you call Windows API calls. Or if you have legacy code that inc or dec of str [0] does to change its length.

0

Rohit gupta Jun 28 '15 at 6:38

source share

Jim mckeeth · Accepted Answer · 2008-09-24T16:20:03+0000

With Unicode SizeOf (SomeChar) <> Length (SomeChar). Essentially String Length less than the sum of size char . Until you assume that SizeOf (Char) = 1 or SizeOf (SomeString [x]) = 1 (since both are now FALSE ) or try exchanging byte s with char , then you should not have any problems. Anywhere you do something creative filling Byte s into char or String , you will need to use AnsiString .

(SizeOf (SomeString) is still 4 regardless of length, as it is essentially a pointer with some compiler magic.)

Delphi 2009 + Unicode + Char -size

More articles: