What is the purpose of using the notation [^ in scanf?

I came across some code and wondered what the original developer was doing. The following is a simplified program using this template:

#include <stdio.h> int main() { char title[80] = "mytitle"; char title2[80] = "mayataiatale"; char mystring[80]; /* hugh ? */ sscanf(title,"%[^a]",mystring); printf("%s\n",mystring); /* Output is "mytitle" */ /* hugh ? */ sscanf(title2,"%[^a]",mystring); /* Output is "m" */ printf("%s\n",mystring); return 0; } 

the man page for scanf contains relevant information, but I have trouble reading it. What is the purpose of using this kind of notation? What is he trying to achieve?

+5
source share
4 answers

The main reason for character classes is that the% s notation stops at the first space character, even if you specify the length of the field, and you often don't want to. In this case, the symbol class designation can be extremely useful.

Consider this code to read a string up to 10 characters long, discarding any excess, but keeping spaces:

 #include <ctype.h> #include <stdio.h> int main(void) { char buffer[10+1] = ""; int rc; while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0) { int c = getchar(); printf("rc = %d\n", rc); if (rc >= 0) printf("buffer = <<%s>>\n", buffer); buffer[0] = '\0'; } printf("rc = %d\n", rc); return(0); } 

This was sample code for discussing comp.lang.c.moderated (circa June 2004) regarding getline() options.


At least some confusion reigns. The first format specifier %10[^\n] reads up to 10 characters other than the new line, and they are assigned to the buffer along with a trailing zero. The second format specifier %*[^\n] contains an assignment suppression character ( * ) and reads zero or more remaining characters, other than a new line, from the input. When the scanf() function completes, the input points to the next newline character. The loop body reads and prints this character, so when the loop restarts, the input looks at the beginning of the next line. Then the process is repeated. If the line is shorter than 10 characters, then these characters are copied to the buffer, and the format of "zero or no longer news lines" processes a zero that is not associated with the new line.

+5
source

Constructs like %[a] and %[^a] exist, so scanf() can be used as a kind of lexical analyzer. They're kind of like %s , but instead of collecting a range of as many โ€œhardโ€ characters as possible, they only collect a range of characters, as described in the character class. There are times when writing %[a-zA-Z0-9] may make sense, but I'm not sure if I see a convincing use case for additional classes with scanf() .

IMHO, scanf() is simply not the right tool for this job. Each time I intended to use one of my more powerful functions, I ended up breaking it and implementing this function in a different way. In some cases, this meant using lex to write a real lexical analyzer, but usually they did a line at a time with I / O and roughly split it into tokens using strtok() before making the conversion enough.

Edit:. I ended up breaking scanf() , usually because when you find that users insist on providing incorrect input, it just doesnโ€™t help the program give good feedback about the problem and the assembler prints "Error, completed." as its only useful error message did not match my user. (Me, in this case.)

+4
source

It looks like a set of characters from regular expressions; [0-9] matches a string of digits, [^aeiou] matches any that is not a lower case vowel, etc.

There are all kinds of uses, such as deriving numbers, identifiers, fragments of spaces, etc.

+2
source

You can read about this in the ISO / IEC9899 standard , available on the Internet.

Here is the paragraph that I am quoting from the document on [ (Page 286):

Corresponds to a nonempty sequence of characters from the set of expected characters.

The conversion specifier includes all subsequent characters in the format string, up to the appropriate right bracket (]). characters between brackets (scan list) constitute a scan if the character after the left bracket is not a workaround (^), in which case the scan contains all characters that are not displayed in the scan list between the envelope and the right bracket. If the conversion specifier begins with [] or [^], the right bracket character is in the scan list, and the next next right character bracket is the matching right bracket that ends the specification; otherwise, the first next character of the right bracket is the one that ends the specification. If a-character is in the scan list and is not the first or second, where the first character is ^ or the last character, the behavior is determined by the implementation.

0
source

All Articles