Does heap memory use (malloc / new) use non-deterministic program creation?

Question

Does heap memory use (malloc / new) use non-deterministic program creation?

I started developing software for real-time systems a few months ago in C for space applications, as well as for microcontrollers with C ++. There is an empirical rule in such systems that they should never create heap objects (so there is no malloc / new), because it makes the program non-deterministic . I could not verify the correctness of this statement when people tell me about it. So, is this the right statement?

My confusion is that, as far as I know, determinism means that running the program twice will lead to the exact same execution path. In my opinion, this is a problem with multi-threaded systems, since when you run the same program several times, different threads in different orders can be executed each time.

+72

c ++ c real-time non-deterministic deterministic

The Quantum Physicist Sep 19 '17 at 10:36 on

source share

11 answers

The comment, as indicated, is incorrect.

Using a heap manager with non-deterministic behavior creates a program with non-deterministic behavior. But this is obvious.

A little less obvious is the presence of heap managers with deterministic behavior. Perhaps the best known example is a pool allocator. It has an array of N * M bytes and an available[] mask of N bits. To highlight, it checks the first available record (bit test, O (N), deterministic upper bound). To free, it sets the available bit (O (1)). malloc(X) rounds X to the next highest M value to select the correct pool.

This may not be very effective, especially if your choice of N and M is too large. And if you choose too few, your program may fail. But the limits for N and M can be lower than for an equivalent program without dynamic memory allocation.

+39

MSalters Sep 19 '17 at 12:28

source share

Nothing in the C11 standard or n1570 says that malloc is deterministic (or not); and no other documentation like malloc (3) on Linux. BTW, many implementations of malloc freeware .

But malloc may (and does not work) fail, and its performance is unknown (a typical malloc call on my desktop will take less than a microsecond, but I could imagine strange situations where it could take a lot more, maybe a lot of milliseconds on a very busy computer, read about thrashing ). And my Linux desktop is ASLR (randomization of the address space layout), so running the same program at the same time gives different malloc addresses (in the virtual address of the process space). BTW here is deterministic (under certain assumptions that you need to develop), but a practically useless malloc implementation.

determinism means running the program twice will lead to the exact same execution path

This is almost untrue in most embedded systems as the physical environment changes; for example, rocket engine control software cannot expect traction, or drag, or wind speed, etc. exactly match one run to the next.

^{(so I'm surprised that you believe or want real-time systems to be deterministic, they never happen! Perhaps you care about WCET , which is harder to predict due to caches )}

By the way, some "real-time" or "embedded" systems implement their own malloc (or some version of it). C ++ programs may have allocator -s used by the container s standard. See Also this and which , etc. Etc......

And higher-level firmware levels (such as a stand-alone vehicle and its planning ) certainly use heap allocation and perhaps even a garbage collection technique (some of them are “real-time”), but are usually not considered security critical.

+21

Basile Starynkevitch Sep 19 '17 at 10:59 on

source share

tl; dr: This is not that dynamic memory allocation is inherently not deterministic (as you defined it in terms of identical execution paths); this is what usually makes your program unpredictable . In particular, you cannot predict whether a distributor can fail due to an arbitrary sequence of inputs.

You may have a non-deterministic distributor. This is actually common outside of your real-time world, where operating systems use things like address location randomization. Of course, this will make your program non-deterministic.

But this is not an interesting case, therefore we can assume a completely deterministic distributor: the same sequence of distributions and deallocations will always lead to the same blocks in the same places, and these distributions and releases will always have a limited run time.

Now your program can be deterministic: the same set of inputs will lead to exactly the same execution path.

The problem is that if you allocate and free memory in response to the input, you cannot predict whether the allocation will ever fail (and failure is not an option).

Firstly, your program may leak memory. Therefore, if it is necessary to run indefinitely, the distribution will ultimately fail.

But even if you can prove that there are no leaks, you need to know that there will never be an input sequence that could require more memory than is available.

But even if you can prove that the program will no longer need more memory than is available, the allocator may, depending on the sequence of allocating and freeing the memory of the fragments, and therefore, ultimately will not be able to find an adjacent block to satisfy the distribution, even if it has enough free memory.

It is very difficult to prove that there is no sequence of inputs that leads to pathological fragmentation.

You can design distributors to ensure that there is no fragmentation (for example, by allocating blocks of only one size), but this creates a significant limitation for the caller and possibly increases the amount of memory required due to waste. And the caller must still prove that there are no leaks and that full memory requires a saturated upper bound, regardless of the sequence of inputs. This burden is so great that it’s actually easier to create a system so that it does not use dynamic memory allocation.

+12

Adrian McCarthy Sep 19 '17 at 23:49 on

source share

The deal with real-time systems is that the program must strictly comply with certain calculations and memory restrictions, regardless of the execution path (which can vary greatly depending on the input). So, what does using a shared dynamic memory allocation (e.g. malloc / new) mean in this context? This means that the developer at some point cannot determine the exact memory consumption, and it would be impossible to determine whether the resulting program can fulfill the requirements for both memory and processing power.

+10

VTT Sep 19 '17 at 10:46 on

source share

Yes, it's right. For the kinds of applications that you mentioned, everything that can happen should be detailed. The program should process the worst case scenario in accordance with the specification and allocate just such a large memory, no more, no less. The situation where "we do not know how much input we get" does not exist. The worst case scenario is given with fixed numbers.

Your program must be deterministic in the sense that it can handle everything up to the worst case scenario.

The very goal of the heap is to allow several unrelated applications to exchange RAM, for example, on a PC, where the number of programs / processes / threads is not determined. This scenario does not exist in a real-time system.

In addition, the heap is not deterministic in nature, since segments are added or removed over time.

Additional information here: https://electronics.stackexchange.com/a/171581/6102

+7

Lundin Sep 19 '17 at 11:03 on

source share

Even if your heap dispenser has repeating behavior (the same sequence of distribution and free calls gives the same sequence of blocks, therefore (hopefully) the same internal state of the heap), the state of the heap can vary greatly if the sequence of calls change, which potentially leads to fragmentation, which in an unpredictable way will lead to failures in the allocation of memory.

The distribution of the heap causes disapproval against the forbidden in embedded systems, especially. Critical systems, such as aircraft or spacecraft control systems or life support systems, do not allow checking all possible variations in the sequence of malloc / free calls that may occur in response to internal asynchronous events.

The solution for each handler has its own memory allocated for its purpose, and this no longer matters (at least with regard to memory usage) in the order in which these handlers are called.

+5

chqrlie Sep 19 '17 at 21:41

source share

The problem with using heaps in hard real-time software is allocating heaps. What do you have when you run out of a heap?

You are talking about space applications. You have pretty strict rejection requirements. You should not have the possibility of a memory leak, so at least safe mode code is not enough. You must not fall. You should not throw exceptions that do not have a catch block. You probably don't have an OS with protected memory, so a single crash application could theoretically take everything out.

You probably don't want to use a bunch at all. The benefits do not outweigh the cost of the entire program.

Non-deterministic usually means something else, but in this case, the best reading is that all program behavior is completely predictable.

+3

Joshua Sep 19 '17 at 21:13

source share

Short answer

There are some effects on data values or their statistical distributions of uncertainty, such as a trigger scintillator of the first or second level, which can be obtained from the irreparable amount of time that you may have to wait malloc/free .

The worst aspect is that they are not connected with a physical phenomenon either with a hardware one, but somehow with a state of memory (and its history).

Your goal in this case is to restore the original sequence of events from the data affected by these errors. Errors will also affect the corrected / guessed sequence. This iteration will not always converge on a sustainable solution; it is not said that he will be right; your data is no longer independent ... you risk a logical short circuit ...

Longer answer

You said: "I could not verify the correctness of this statement when people tell me about it."
I will try to give you a purely hypothetical situation / case study.

Suppose you are dealing with a CCD or some scintillators of the first and second levels in the system, which should save resources (you are in space).
The receive rate will be set so that the background is at x% MAXBINCOUNT .

There is a surge there, you have a surge in the counts and an overflow in the bin counter.
I want everything: you switch to the maximum receive speed and end your buffer.
You go to free / allocate more memory while you finish the extra buffer.
What are you going to do?
- You will retain counter risk <overflow (the second level will try to correctly calculate the timeouts of data packets), but in this case, you will go on to underestimate the strong> calculations for this period?
- will you stop the counter by entering a hole in the time series ?
Note that:
- By waiting for the selection, you will lose the transition process (or at least its beginning).
- Whatever you do, it depends on the state of your memory and does not play.
Now, instead, the signal will be variable around the MAXBINCOUNT with the maximum reception speed available on your equipment, and this event is more than usual.
You will finish the space and ask for more ... Meanwhile, you will carry the same problem above.
Overflow and systematic peaks count underestimation or holes in time series?

Let's move the second level (it can also be on the first level trigger).

You get more data from your equipment than you can store or transfer.
You must put the data in time or space (2x2, 4x4, ... 16x16 ... 256x256 ... pixel scaling ...).

Uncertainty in a previous problem can affect the distribution of errors .
There is a CCD setting for which you have border pixels with counts close to MAXBINCOUNT (it depends on where you want to see better).
Now you can have a shower on your CCD or one large place with the same total number of samples, but with a different statistical uncertainty (the part entered by the waiting time) ...

So, for example, when you expect a Lorentz profile, you can get its convolution with a Gaussian (Voigt), or if the second one really dominates with a dirty Gaussian ...

+2

Hastur Sep 20 '17 at 12:14

source share

Implement RTOS integrity from GHS:

https://www.ghs.com/products/rtos/integrity.html

and LynxOS:

http://www.lynx.com/products/real-time-operating-systems/lynxos-178-rtos-for-do-178b-software-certification/

LynxOS and Integrity RTOS are software used in space applications, rockets, airplanes, etc., since many others are not approved or certified by authorities (e.g. FAA).

https://www.ghs.com/news/230210r.html

To meet the stringent application criteria for space applications, Integrity RTOS actually provides a formal check, that is, mathematically proven logic, that their software behaves in accordance with the specification.

Among these criteria, citing here:

https://en.wikipedia.org/wiki/Integrity_(operating_system)

and here:

Green Hills Integrity Dynamic Memory Allocation

:

I am not a specialist in formal methods, but perhaps one of the requirements for this verification is to eliminate the uncertainties in the timing required for memory allocation. In RTOS, all events are precisely scheduled in milliseconds from each other. And dynamic memory allocation always has a problem with the required shelf life.

Mathematically, you really need to prove that everything works on most fundamental assumptions about time and memory size.

And if you are thinking about heap memory alternatives: static memory . The address is fixed, the size of the allocated fixed. The memory position is fixed. Therefore, it is very easy to talk about the adequacy of memory, reliability, availability, etc.

+2

Peter Teoh Sep 20 '17 at 13:10

source share

There is always a compromise. This is the program runtime and the tasks it performs should be the basis for deciding whether to use HEAP or not.

The Heap object is effective when you want to share data between multiple function calls. You just need to pass the pointer, since the heap is available all over the world. There are also disadvantages. Some functions may free this memory, but some links may still exist in other places.

If the heap memory is not freed after shutting down and the program continues to allocate more memory, at some point HEAP will end from memory and affect the deterministic nature of the program.

-3

Naresh Sep 19 '17 at 11:05

source share

Peter · Accepted Answer · 2017-09-19 11:03

In the context of real-time systems, there is more determinism than the repeated "execution path." Another required property is that the time of key events is limited. In hard real-time systems, an event that occurs outside the allowed time interval (before the start of this interval or after the end) is a system failure.

In this context, the use of dynamic memory allocation can lead to determinism, especially if the program has a different distribution, release, and redistribution scheme. The allocation, release, and redistribution times can change over time - and therefore make timings for the system as a whole unpredictable.

Does heap memory use (malloc / new) use non-deterministic program creation?

Short answer

Longer answer

Let's move the second level (it can also be on the first level trigger).

More articles: