What are Pascal strings?

Are they named after a programming language or math?

What are the defining characteristics of Pascal strings? In the Wikipedia article on strings, it seems that the defining feature is storing the length of the string in the first byte. In another article, it seems to me that the string memory layout is also important.

While viewing an unrelated SO stream, someone said Pascal strings make Excel fast . What are the advantages of Pascal strings over null-terminated strings? Or, in general, in what situations are Pascal strings superior?

Are Pascal strings implemented in any other languages?

Finally, do I use both words ("Pascal Strings") or only the first ("Pascal strings")? I am a technical writer ...

+10
string data-structures pascal
source share
2 answers

Pascal's strings have become popular thanks to one specific but huge influential Pascal implementation called UCSD. So UCSD strings are the best term. This is the same implementation that made popular bytecode interpreters.

In general, this is not one specific type, but the basic principle of having a size preceded by character data. This allows you to get the length of the constant operation (O (1)) instead of scanning character data for a null character.

Not all Pascals have used this concept. IIRC, the original (seventieth) convention was to space fill the selection and scan backward for a character without a space (which makes line breaks impossible). Moreover, since the software was mainly used in isolation, all sorts of schemes were used, often based on what was beneficial for this implementation / architecture.

The most popular dialects from Borland (Turbo Pascal, Delphi and Free Pascal) are usually based on the UCSD dialect and therefore have pascal strings, Delphi currently has 5 such strings. (short / ansi / wide / unicode / open)

On the other hand, this means that in the loop you need additional index-based checking to check the end of the line.

So, instead, by copying the line using

while (p^) do begin P^=p2^; inc(p) inc(p2); end; 

which is completely equivalent

 while (*s++ = *t++); 

in C when using the optimizing compiler.

you need to do, for example.

 while (len>0) do begin p^:=p2^; inc(p) inc(p2); dec(len); end; 

or even

 i:=1; while (i<=len) do begin p[i]:=p2[i]; inc(i); end; 

This has led to a slightly larger number of instructions in the Pascal string chain than the equivalent zero-terminated string, and adds another value in real time. In addition, UCSD was the language of the bytecode interpreter (p-code), and the latest code based on the pascal string is “safe”.

In the case of the architecture that built the post increment (++) statements (for example, PDP-8.11 C was designed for the original), the version of the pointer was even cheaper, especially without optimization. Currently, compiler optimization can easily detect any of these constructs and transform them into the best.

More importantly, since the early 1990s, security has become more important and, mainly relying solely on the null terminated strings property, has been disapproved, as small errors in validation can cause potential buffer overflow problems. C and its standards, therefore, did not approve of the use of the old string, and now use the "-n-" versions of old string routines (strNcpy, etc.) that require a maximum length. This adds the same additional real-time value, similar to length, for example, as the manually-controlled principle of Pascal strings, where the programmer must take care of passing the length (or maximum buffer size for C-N functions). Pascal strings still have the advantage of moving to the last occupied char in O (1) operation and the fact that there are no forbidden characters.

Length prefix strings are also widely used in file format, because obviously the number of bytes to read ahead is useful.

+10
source share

This is an old name dating back to the days when "C language versus Pascal" was actually comparable to humans. Depending on who you ask, it either specifically stores the length in the first byte, or refers to any length prefix (two bytes, four bytes). Other memory management data is not included, it is implementation dependent and does not differ fundamentally from C.

Pascal's lines are superior ... all. NUL-terminated strings store one to three bytes on short lines, which may have been useful in 1970, but not worth mentioning today in almost all circumstances. Besides the fact that it is impossible to store a null byte (which is not so bad for the text, but excludes any binary data), you cannot effectively determine the length of a string. This negatively affects the good part of string algorithms. For example, in the comment you refer to is a string comparison: if you have a length, you can instantly return false when comparing strings of different lengths. There are also many other non-performance flaws .

For these reasons, almost every language implementation higher than around 1980 uses length prefixes for strings. This is another reason why the name "pascal string" is deprecated.

+5
source share

All Articles