Filtering the source code is not enough. Even if the source code does not call the API, it can use tricks to call it. For example, a simple regular expression filter can be broken by inserting labels. And this is only at the source code level; when you start thinking about machine codes, there are many other possibilities: from simple built-in assembly to reverse oriented programming , and it can be done in a way that is hard to see when viewing the source code, as shown by Underhanded C Contest .
All APIs ultimately boil down to kernel APIs, as the programmer can simply copy the API implementation otherwise. There are AFAIK only two safe ways to prevent kernel API calls: either filter it in the kernel or statically prove that the code cannot directly call the kernel. Other methods, such as LD_PRELOAD , can be circumvented. Bypassing LD_PRELOAD is simple; just make a system call directly.
To filter the API in the kernel, the most recent way is to use seccomp filters , which allows you to limit system calls and their parameters. With it, you can easily prohibit a process, for example, ever allowing shutdown and socket system calls to be called. Other mechanisms (namespaces, groups, chroot, etc.) can be used to add other kinds of constraints on top of the filter.
An alternative approach to statistically verify the code is safe, using Googleβs Native Client . It restricts the generated assembly code in ways that allow simple evidence that the thread of execution cannot exit the sandbox, with the exception of a few well-defined methods. As an example of such rules, no instructions can cross a 32-byte boundary, all jump targets are aligned with a 32-byte boundary, and indirect jumps are allowed only through a couple of commands that mask the lower bits of the target address before the jump, so there is no way to go to the middle instructions.
source share