Why does the GCC highlight a warning when using trigraphs, but not when using digraphs?

code:

#include <stdio.h> int main(void) { ??< puts("Hello Folks!"); ??> } 

The above program compiled with GCC 4.8.1 with -Wall and -std=c11 will generate the following warning:

 source_file.c: In function 'main': source_file.c:8:5: warning: trigraph ??< converted to { [-Wtrigraphs] ??< puts("Hello Folks!"); ??> ^ source_file.c:8:30: warning: trigraph ??> converted to } [-Wtrigraphs] 

But when I change the body of main to:

 <% puts("Hello Folks!"); %> 

warnings are not issued.

So, why does the compiler warn me when using trigraphs, but not when using digraphs?

+5
source share
4 answers

This gcc preprocessing document provides a pretty good rationale for the warning (emphasis mine):

Trigraphs are not popular, and many compilers do not implement them correctly. Portable code should not rely on trigraphs that are either converted or ignored. With -Wtrigraphs, GCC will warn you when a trigraph can change the meaning of your program if it has been converted .

and in this document, gcc under Tokenization explains that digraphs, unlike trigraphs, do not have potential negative side effects (emphasis mine):

There are also six digraphs that the C ++ standard calls for alternative tokens, which are simply alternative ways to write other punctuators. This is the second attempt to circumvent missing punctuation on legacy systems. It has no negative side effects, unlike trigraphs ,

+4
source

Because trigraphs have the undesirable effect of silent code changes. This means that the same source file is valid both with the replacement of the trigraph and without it, but leads to a different code. This is especially problematic in string literals like "<em>What??</em>" .

Language design and the evolution of language should strive to avoid silent changes. Providing the compiler with a warning about trigraphs is good.

Compare this to digraphs, which were new tokens that do not lead to silent changes.

+5
source

Maybe because it has no negative side effects, unlike trigraphs, as stated in gcc :

Punctuators are all the usual punctuation bits that make sense for C and C ++. All but three punctuation characters in ASCII are C-punctuation. The exceptions are "@", "$" and "". In addition, all two- and three-character operators are punctuators. There are also six digraphs that the C ++ standard calls for alternative tokens, which are simply alternative ways to write other punctuators. This is the second attempt to circumvent missing punctuation on legacy systems. It has no negative side effects, unlike trigraphs, but does not cover as much land. The digraphs and their respective normal punctuators are:

  Digraph: <% %> <: :> %: %:%: Punctuator: { } [ ] # ## 
+3
source

Trigraphs are nasty because they use sequences of characters that could legitimately appear within valid code. A common case that caused compiler errors for code for classic Macintosh:

 unsigned int signature = '????'; /* Should be value 0x3F3F3F3F */ 

Trigraph processing would turn it into:

 unsigned int signature = '??^; /* Should be value 0x3F3F3F3F */ 

which, of course, will not compile. In some rarer cases, such processing could get code to be compiled, but with a different meaning from what was intended, for example,

 char *template = "????/1234"; 

which will turn into

 char *template = "??S4"; // ??/ becomes \, and \123 becomes S 

Not a string literal that was intended, but nonetheless completely legal.

In contrast, digraphs are relatively benign because, outside of some possible strange corner cases involving macros, no code containing processed digraphs would have a legitimate meaning if there was no such processing.

+2
source

All Articles