Is fork () copy-on-write a robust open behavior that can be used to implement read-only shared memory?

The man page on fork () indicates that it does not copy data pages, but maps them to a child process and sets the copy-on-write flag. This behavior:

  • Meets Linux Flavors?
  • considered implementation detail and therefore could change?

I am wondering if I can use fork () as a means to get a shared read-only memory block for cheap. If the memory is physically copied, it will be quite expensive - there is a lot of branching there, and the data area is quite large, but I hope not ...

+4
source share
5 answers

Linux running on machines without an MMU (memory management unit) will copy all the process memory to fork ().

However, these systems are usually very small and embedded, and you probably don't need to worry about them.

Many services, such as the Apache fork model, use the initialize and fork () methods to share initialized data structures.

You should be aware that if you use languages ​​such as Perl and Python that use reference counting variables or C ++ shared_ptr, this model will not work. This will not work, because as the reference count is adjusted up and down, the memory becomes unpartitioned and copied.

This results in a huge amount of memory usage in Perl daemons, such as SpamAssassin, which try to use the initialization model and fork.

+3
source

Yes, you can definitely rely on it on the MMU-Linux kernels; that's almost all.

However, page size is not everywhere.

You can explicitly create a shared memory area for the forked process using mmap () to create an anonymous map that is not supported by the physical file. In the fork, this area will always be divided (provided that the child does not delete it or does not match anything else at one address). You can protect it if you want, read-only.

The memory allocated (for example), malloc can easily complete the exchange of a page with something that is not readonly, which means that it is still copied when a different structure is changed. This includes the internal structures used by the malloc implementation. Thus, you may want to assign a specific area for this purpose and select from it.

+3
source

Can you rely on the fact that all Linux applications do it this way? Not. But you can rely on the fact that those who do not use an even faster method.

Therefore, you should use this function and rely on it and reconsider your decision if you have a performance problem.

+2
source

The success of this approach depends on how well you adhere to your limited read-only limit. Both parents and the child must obey this system, otherwise the memory will be copied.

However, this may not be the disaster you are suggesting. The kernel can copy only one page (usually 4 KB) to implement CoW semantics. A typical Linux server will use something more complex, a kind of slab allocator , so the copied area can be much larger.

The main thing is that it is untied from your concept of a program for using its memory. If you malloc() 1 GB of RAM, open the child and the child will only change the first byte of this memory block, the entire 1 GB block will not be copied. Perhaps just one copy is copied to the size of the slab containing this first byte.

+2
source

Yes

All Linux distributions use the same kernel, although they have slightly different versions and releases.

It is not true that another basic fork (2) implementation will be faster any time soon, so a safe bet is that copying to write will remain the mechanism. It may not be forever, but definitely for many years.

Of course, some large software systems (such as Phusion Passenger ) use fork (2) the same way you want, so you would not be the only one to use CoW.

+1
source

All Articles