Strange clone behavior

Question

Strange clone behavior

This is a fairly simple application that creates an easy process (thread) with a call to clone() .

 #define _GNU_SOURCE #include <sched.h> #include <stdio.h> #include <sys/types.h> #include <unistd.h> #include <errno.h> #include <stdlib.h> #include <time.h> #define STACK_SIZE 1024*1024 int func(void* param) { printf("I am func, pid %d\n", getpid()); return 0; } int main(int argc, char const *argv[]) { printf("I am main, pid %d\n", getpid()); void* ptr = malloc(STACK_SIZE); printf("I am calling clone\n"); int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL); // works fine with sleep() call // sleep(1); if (res == -1) { printf("clone error: %d", errno); } else { printf("I created child with pid: %d\n", res); } printf("Main done, pid %d\n", getpid()); return 0; }

Here are the results:

Run 1:

 ➜ LFD401 ./clone I am main, pid 10974 I am calling clone I created child with pid: 10975 Main done, pid 10974 I am func, pid 10975

Run 2:

 ➜ LFD401 ./clone I am main, pid 10995 I am calling clone I created child with pid: 10996 I created child with pid: 10996 I am func, pid 10996 Main done, pid 10995

Run 3:

 ➜ LFD401 ./clone I am main, pid 11037 I am calling clone I created child with pid: 11038 I created child with pid: 11038 I am func, pid 11038 I created child with pid: 11038 I am func, pid 11038 Main done, pid 11037

Run 4:

 ➜ LFD401 ./clone I am main, pid 11062 I am calling clone I created child with pid: 11063 Main done, pid 11062 Main done, pid 11062 I am func, pid 11063

What's going on here? Why is the message “I created a child” sometimes printed several times?

I also noticed that adding a delay after clone causes a “fix” problem.

+5

c multithreading linux clone

lstipakov Jul 20 '16 at 20:37

source share

5 answers

Your processes use the same stdout (that is, the standard C FILE struct library), which includes a randomly allocated buffer. This is undoubtedly causing problems.

+3

rici Jul 20 '16 at 21:29

source share

I cannot recreate the problem with the OP, but I do not think printf is actually a problem.

glibc docs :

The POSIX standard requires that the default operation of a stream is atomic. Ie, issuing two threads for one thread to two threads at the same time will cause operations to execute as if they were issued sequentially. The performed buffer operations while reading or writing are protected from other uses of the same stream. To do this, each thread has an internal blocking object that has to be (implicitly) acquired before any work can be done.

Edit:

Although the above is true for streams, as rici points out, there is a comment on sourceware :

Basically, with CLONE_VM you can’t do anything safe, unless the child is limited to pure computation and direct system calls (via SYS / syscall.h). If you use any standard library, you risk a parent and a child confusing internal states with each other. You, too, have problems such as the fact that glibc caches pid / tid in user space, and the fact that glibc expects to always have the correct stream pointer which your clone call cannot initialize correctly, because it does not know (and should not know) internal implementation of threads.

Apparently, glibc is not designed to work with the clone if CLONE_VM is installed, but CLONE_THREAD | CLONE_SIGHAND - no.

+3

evaitl Jul 20 '16 at 22:16

source share

Everyone seems to be saying: this is actually a problem, how can I put it in the case of clone() , the security of the process? With a rough sketch of locking the printf version (using write(2) ), the output will be as expected.

 #define _GNU_SOURCE #include <sched.h> #include <stdio.h> #include <sys/types.h> #include <unistd.h> #include <errno.h> #include <stdlib.h> #include <time.h> #define STACK_SIZE 1024*1024 // VERY rough attempt at a thread-safe printf #include <stdarg.h> #define SYNC_REALLOC_GROW 64 int sync_printf(const char *format, ...) { int n, all = 0; int size = 256; char *p, *np; va_list args; if ((p = malloc(size)) == NULL) return -1; for (;;) { va_start(args, format); n = vsnprintf(p, size, format, args); va_end(args); if (n < 0) return -1; all += n; if (n < size) break; size = n + SYNC_REALLOC_GROW; if ((np = realloc(p, size)) == NULL) { free(p); return -1; } else { p = np; } } // write(2) shoudl be threadsafe, so just in case flockfile(stdout); n = (int) write(fileno(stdout), p, all); fflush(stdout); funlockfile(stdout); va_end(args); free(p); return n; } int func(void *param) { sync_printf("I am func, pid %d\n", getpid()); return 0; } int main() { sync_printf("I am main, pid %d\n", getpid()); void *ptr = malloc(STACK_SIZE); sync_printf("I am calling clone\n"); int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL); // works fine with sleep() call // sleep(1); if (res == -1) { sync_printf("clone error: %d", errno); } else { sync_printf("I created child with pid: %d\n", res); } sync_printf("Main done, pid %d\n\n", getpid()); return 0; }

The third time: this is just a sketch, there is no time for a reliable version, but this should not stop you from writing.

+2

deamentiaemundi Jul 20 '16 at 22:20

source share

As evaitl points out, printf documented as thread-safe by the glibc documentation. BUT , this usually assumes that you are using the assigned glibc function to create threads (i.e. pthread_create() ). If you do not, you are on your own.

A lock made by printf() is recursive (see flockfile ). This means that if the lock has already been completed, the implementation checks the owner of the lock on the locker. If the cabinet matches the owner, the attempt to lock was successful.

To distinguish between different threads, you need to configure TLS correctly, but you are not doing pthread_create() . I assume that in your case the TLS variable that identifies the stream is the same for both threads, so you get a lock.

TL DR: use pthread_create()

+2

ynimous Jul 20 '16 at 23:07

source share

Craig estey · Accepted Answer · 2016-07-20T23:01:44+0000

You have a race condition (i.e. you do not have the implied safety of the stdio stream).

The problem is even more serious. You may receive duplicate "func" messages.

The problem is that using clone does not have the same guarantees as pthread_create . (i.e. you will not get printf safe flow options.

I do not know for sure, but, IMO, the phrase about stdio threads and thread safety, in practice, applies only when using pthreads .

So you have to handle your own interthread lock.

Here is the version of your program transcoded to use pthread_create . It seems to work without incident:

 #define _GNU_SOURCE #include <sched.h> #include <stdio.h> #include <sys/types.h> #include <unistd.h> #include <errno.h> #include <stdlib.h> #include <time.h> #include <pthread.h> #define STACK_SIZE 1024*1024 void *func(void* param) { printf("I am func, pid %d\n", getpid()); return (void *) 0; } int main(int argc, char const *argv[]) { printf("I am main, pid %d\n", getpid()); void* ptr = malloc(STACK_SIZE); printf("I am calling clone\n"); pthread_t tid; pthread_create(&tid,NULL,func,NULL); //int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL); int res = 0; // works fine with sleep() call // sleep(1); if (res == -1) { printf("clone error: %d", errno); } else { printf("I created child with pid: %d\n", res); } pthread_join(tid,NULL); printf("Main done, pid %d\n", getpid()); return 0; }

Here is a test script that I used to check for errors [it's a little rude, but should be fine). Run against your version and it will be interrupted quickly. The pthread_create version seems to go just fine

 #!/usr/bin/perl # clonetest -- clone test # # arguments: # "-p0" -- suppress check for duplicate parent messages # "-c0" -- suppress check for duplicate child messages # 1 -- base name for program to test (eg for xyz.c, use xyz) # 2 -- [optional] number of test iterations (DEFAULT: 100000) master(@ARGV); exit(0); # master -- master control sub master { my(@argv) = @_; my($arg,$sym); while (1) { $arg = $argv[0]; last unless (defined($arg)); last unless ($arg =~ s/^-(.)//); $sym = $1; shift(@argv); $arg = 1 if ($arg eq ""); $arg += 0; ${"opt_$sym"} = $arg; } $opt_p //= 1; $opt_c //= 1; printf("clonetest: p=%dc=%d\n",$opt_p,$opt_c); $xfile = shift(@argv); $xfile //= "clone1"; printf("clonetest: xfile='%s'\n",$xfile); $itermax = shift(@argv); $itermax //= 100000; $itermax += 0; printf("clonetest: itermax=%d\n",$itermax); system("cc -o $xfile -O2 $xfile.c -lpthread"); $code = $? >> 8; die("master: compile error\n") if ($code); $logf = "/tmp/log"; for ($iter = 1; $iter <= $itermax; ++$iter) { printf("iter: %d\n",$iter) if ($opt_v); dotest($iter); } } # dotest -- perform single test sub dotest { my($iter) = @_; my($parcnt,$cldcnt); my($xfsrc,$bf); system("./$xfile > $logf"); open($xfsrc,"<$logf") or die("dotest: unable to open '$logf' -- $!\n"); while ($bf = <$xfsrc>) { chomp($bf); if ($opt_p) { while ($bf =~ /created/g) { ++$parcnt; } } if ($opt_c) { while ($bf =~ /func/g) { ++$cldcnt; } } } close($xfsrc); if (($parcnt > 1) or ($cldcnt > 1)) { printf("dotest: fail on %d -- parcnt=%d cldcnt=%d\n", $iter,$parcnt,$cldcnt); system("cat $logf"); exit(1); } }

UPDATE:

Could you recreate the problem with OPs with clone?

That's right. Before I created the pthreads version, in addition to testing the original version of OP, I also created versions that:

(1) added setlinebuf to the start of main

(2) added fflush just before clone and __fpurge as the first func statement

(3) added fflush to func before return 0

Version (2) eliminated duplicate parent messages, but duplicate child messages remained

If you want to do this, download the OP version from the question, my version and test script. Then run the test script in the OP version.

I have posted enough information and files so that everyone can recreate the problem.

Please note that due to the differences between my system and the OP, I could not first reproduce the problem in just 3-4 attempts. So why I created a script.

The script executes 100,000 test runs, and usually the problem will manifest itself within 5000-15000.

Strange clone behavior

More articles: