Do strings contain empty substrings everywhere?

Question

Do strings contain empty substrings everywhere?

This question arises from the discussion emanating from this answer .

In a nutshell: the author of the answer (0x499602D2) argued (correctly, as I now know) that when it does not skip spaces, but the next character is a space, all statements, with the exception of characters, fail.

I questioned this on the basis that I thought that retrieving string should not fail, because the stream contained an empty string limited to the space character at the beginning.

This has turned into a general discussion about whether there is an empty line at any position in the line, for example. between a and b string "ab" (I say yes, 0x499602D2 says no). 0x499602D2 suggested that I put this in question, so I'm here.

I copy my main arguments for my position from this thread (including part of the chat):

First, consider a constant for an empty string. In C and C ++, content is limited to quotation marks at the beginning and end. So what does an empty string look like? You know that: "" . You see, after the initial quote (separator), the final quote (separator) immediately follows. An empty string is between two quotation marks that follow directly on top of each other because the empty string has no characters. Also look at representation C. This is a sequence of characters followed by the delimiter '\ 0'. So what is the representation of an empty string? Well, the characters are blank lines followed by a separator. This means that the first character is a delimiter (that is, exactly the same as in the case of a stream). Now consider the string concatenation, where, for example, the first line is "a" , the second line is empty, and the third line is "b" . So what is concatenation? Well, "ab" . It's so clear that there is an empty string between a and b in "ab" (we placed it right there!). And, of course, this is true both before a and after b . That is, there is an empty string (or two or a million) between any two characters of the string.
An empty string has no characters, and there are no characters between consecutive characters. Therefore, there is an empty string between the two characters. Also see Other arguments I gave earlier. Also, consider regular expressions that match an empty string: they also match all. For example, /ab*c/ matches "ac" because b* matches an empty string between a and c
There is an empty line before the separator (space) (i.e. no characters), as in the C representation of the empty line, there are no characters before the \0 separator. Also note that readline also works with the \n delimiter: If \n immediately follows, it does not fail, but gives an empty string.

I cannot identify the main arguments of 0x499602D2 in the discussion, so I am not trying to avoid an unintentional unfair choice. You should be able to see them in the comments (and maybe in the chat - I have no idea if this is available to everyone). @ 0x499602D2: If you want, you can also add your main arguments after this paragraph.

Practical question related to this: If a well-designed function to extract a string does not work, if there are no characters before the separator (like operator>> for strings), or succeed and return an empty string (as readline does)?

+6

c ++ string

celtschk Mar 25 '14 at 10:51

source share

1 answer

kol · Accepted Answer · 2014-03-25T23:33:26+0000

Theorem

There is an empty string & epsilon; at any position on line s.

Proof

1. If | s | = 0 (s has length zero), then s =? Epsilon; and the statement holds.

2. If | s | > 0, then s has two extreme positions: one before the first character, and the second after the last. Because & epsilon; is a single element of the concatenation operation, i.e. ? epsilon; s = s? epsilon; = s, the statement is true for both the initial and final positions.

3. If | s | > 1, then s can be written as the concatenation of two nonempty strings: s = pq, where | p | > 0 and | q | > 0. Using the property of the identification element & epsilon ;, p & epsilon; q = (p εepilon;) q = pq = s, which means that the statement is true for a position in s that divides it into parts p and q. The position of this division can be any internal position s; therefore, the requirement is preserved for each internal position.

Consequence

The property of the identity element means that & epsilon; = & epsilon; ? epsilon; = & epsilon; ? epsilon; ? epsilon; = etc. Repeating the above proof after replacing & epsilon; with? epsilon; ^ n, where n is a positive integer, we find that any line of any line contains an infinite number of empty lines.

Notes

Here, the word "position" means the position "carriage position" ( cursor input position ). The carriage can be placed before the first character (index: 0), between consecutive characters and after the last character (index: | s |). Number of carriage positions | s | + 1.

The above proof shows that these “zero-width gaps” between characters can be thought of as being filled with an arbitrary number of blank lines. (It is also strange that an empty set is a subset of each set, including it.)

Do strings contain empty substrings everywhere?

More articles: