C File With No #?

Suppose you give one C source file that contains max. of 300 lines of code.

Suppose also that a file, when implementing several functions, DOES NOT NOT contain the '#' character in it (which means that there are NO #include statuses and no other statements that have a β€œ#” in the file).

My question is, does the above guarantee that the file does not have any I / O? does the file guarantee the ability to (say) erase the contents of a hard drive or do other suspicious things?

(I should get 100-200 single C files that (as mentioned) do not include char # . I was asked to write a simple program that would programmatically check if a single C source file without # be involved in I / O, network access, etc.).

Given the fact that no C # statements are allowed - what is WORST code, can an encoder include in a C file such as this potentially damage the system of the one who runs it?

I know that no check will give 100% accuracy - but I'm interested in at least doing some basic checks that raise the red flag if some expressions / keywords are found. Any ideas on what to look for?

+7
source share
9 answers

No, this cannot guarantee this. You can create code in which everything includes macros, and you can turn it into one huge file and then compile it ... this file will not contain a preprocessor directive, although it can do everything that C can usually do on the system.

+12
source

If the source encoder was supposed to include the built-in assembly, they could do almost anything without importing any libraries.

+5
source

You can simply copy and paste the definitions of standard file types and functions (e.g. FILE, fopen (), fprintf (), flocse ()), etc. in file C. Thus, there is no need to include and when the file is compiled and linked to the appropriate libraries, it will be able to perform I / O operations.

+5
source

# not the only token that can run a preprocessor directive. ??= and %: are equivalent definitions in the standard. (But they are not recognized by all compilers.)

+3
source

C allows unsafe pointer operations. For example, in a system without ASLR, it is trivial to get a pointer to arbitrary library functions. It is not very reliable, since any violation of memory access will kill you, but at least if you know the target system, it is possible.

ASLR makes this a little trickier, but I suppose you could just get a pointer to the current position on the stack and then crawl up until you reach the stack that belongs to your thread's entry point. Which is sure to find some interesting pointers.

+2
source

The absence of preprocessor directives does not guarantee anything but the absence of preprocessor directives.

You can still manually add data types and function prototypes for any library functions that interest you. If you are familiar with the base platform, you can completely bypass the standard library and make system calls directly.

I once saw code (possibly for IOCCC) that used an unsigned char array to store raw opcodes, and then used the punning type to treat it as a function, something like

 unsigned char instr[] = {0x00, 0x12, 0x33, ...}; void (*foo)(void) = (void (*)(void)) instr; foo(); 

Note that this is due to undefined behavior and a lot of unpromising assumptions, and I'm not even sure that this approach will work anymore. But if this happened, it is not something that would be easy to catch with a simple source scan.

EDIT

I found the code I was thinking about - it has been an IOCCC entry since 1984. This does not work as I described, though. Hey, I'm getting old, and the material doesn't stick to my brain like before.

 short main[] = { 277, 04735, -4129, 25, 0, 477, 1019, 0xbef, 0, 12800, -113, 21119, 0x52d7, -1006, -7151, 0, 0x4bc, 020004, 14880, 10541, 2056, 04010, 4548, 3044, -6716, 0x9, 4407, 6, 5568, 1, -30460, 0, 0x9, 5570, 512, -30419, 0x7e82, 0760, 6, 0, 4, 02400, 15, 0, 4, 1280, 4, 0, 4, 0, 0, 0, 0x8, 0, 4, 0, ',', 0, 12, 0, 4, 0, '#', 0, 020, 0, 4, 0, 30, 0, 026, 0, 0x6176, 120, 25712, 'p', 072163, 'r', 29303, 29801, 'e' }; 

Here's an explanation :

  The Grand Prize: 

     Sjoerd Mullender & Robbert van Renesse

 Without question, this C program is the most obfuscated C program that
 has ever been received!  Like all great contest entries, they result
 in a change of rules for the following year.  To prevent a flood of
 similar programs, we requested that programs be non machine specific.

 This program was selected for the 1987 t-shirt collection.

 NOTE: If your machine is not a Vax-11 or pdp-11, this program will
       not execute correctly.  In later years, machine dependent
       code was discouraged.

 The C startup routine (via crt0.o) transfers control to a location
 named main.  In this case, main just happens to be in the data area.
 The array of shorts, which has been further obfuscated by use of
 different data types, just happens to form a meaningful set of PDP-11
 and Vax instructions.  The first word is a PDP-11 branch instruction
 that branches to the rest of the PDP code.  On the Vax main is called with
 the calls instruction which uses the first word of the subroutine as a
 mask of registers to be saved.  So on the Vax the first word can be anything.
 The real Vax code starts with the second word.  This small program
 makes direct calls to the write () Unix system call to produce a
 message on the screen.  Can you guess what is printed?  We knew you
 couldn't!  :-)

 Copyright (c) 1984, Landon Curt Noll.
 All Rights Reserved.  Permission for personal, educational or non-profit use is
 granted provided this this copyright and notice are included in its entirety
 and remains unaltered.  All other uses must receive prior permission in writing
 from both Landon Curt Noll and Larry Bassel.

Again, I don't know if this trick will work on any modern desktop OS, but it would be interesting to know.

+2
source

not necessary. Most compilers generate warnings for implicit declarations, but in any case refer to functions. You can create a list of io-executable functions and see if they are called, but this still doesn’t exclude the built-in asm from calling system-related system calls.

You should probably work with low privileges in the sandbox and see what kind of system calls they make with something like strace.

+1
source

The following program is a valid C program that produces output to stdout . It does not contain # characters:

 int puts(const char *s); int main(void) { puts("hi"); return 0; } 

It does not even display a warning from the compiler ( /Wall /W3 on MSVC and -Wall -Wextra on MinGW), and even more so on an error.

+1
source

You can also try compiling C files in static binary format, parse it, and check the system call instructions (sysenter, int). IO cannot be performed from user space, and the process will have to go to the kernel for any type of I / O.

However, this still does not protect against the execution of instructions in non-text parts of your binary file. In the worst case, you may have instructions that are fabricated at runtime and executed. For this, I believe that it is best to make coverage code while tracking the process for system calls. Linux has a strace that can help with this.

+1
source

All Articles