C API Design: Who Should Allocate?

What is the correct / preferred way to allocate memory in the C API?

First I see two options:

1) Let the calling program do all the (external) memory processing:

myStruct *s = malloc(sizeof(s)); myStruct_init(s); myStruct_foo(s); myStruct_destroy(s); free(s); 

The _init and _destroy necessary because more memory can be allocated inside it, and it needs to be processed somewhere.

This has the disadvantage that it is longer, but in some cases it is also possible to exclude malloc (for example, a structure distributed over stacks may be passed to it:

 int bar() { myStruct s; myStruct_init(&s); myStruct_foo(&s); myStruct_destroy(&s); } 

In addition, the caller needs to know the size of the structure.

2) Hide malloc in _init and free in _destroy .

Advantages: shorter code, as functions will still be called. Completely opaque structures.

Disadvantages: it is impossible to transfer the structure selected in another way.

 myStruct *s = myStruct_init(); myStruct_foo(s); myStruct_destroy(foo); 

I am now inclined to the first case; again, I don't know about the design of the C API.

+50
c memory-management design api malloc
Jul 21 '10 at 4:23
source share
11 answers

My favorite example of a well-designed C API is GTK + , which uses method # 2, which you are describing.

Although another advantage of your method # 1 is not only that you can allocate an object on the stack, but also that you can reuse the same instance multiple times. If this is not a common use case, then simplicity # 2 is probably an advantage.

Of course, this is just my opinion :)

+8
Jul 21 '10 at 4:30
source share

Another disadvantage of No. 2 is that the caller has no control over how things are distributed. This can be circumvented by providing an API for the client to register their own distribution / release functions (e.g. SDL), but even this may not be fine enough.

The disadvantage of # 1 is that it does not work when the output buffers are not fixed (like strings). In the best case, you will need to provide another function to get the length of the buffer first so that the caller can allocate it. In the worst case, it is simply impossible to do this efficiently (i.e., the length of the calculations on a separate path is excessively expensive compared to computing and copying at a time).

The advantage of # 2 is that it allows you to expose your data type strictly as an opaque pointer (i.e. declare a structure, but not define it, but use pointers sequentially). You can then change the structure definition as you see fit in future versions of your library, while clients remain binary compatible. With # 1, you must do this by requiring the client to somehow specify the version inside the structure (for example, all these cbSize fields in the Win32 API), and then manually write code that can handle both old and newer versions of the structure will remain binary compatible as your library evolves.

In general, if your structures are transparent data that will not change with a future minor revision of the library, I would go with No. 1. If this is a more or less complex data object, and you want full encapsulation to be reliable for future development, go to # 2.

+14
Jul 21 2018-10-21T00:
source share

Method number 2 every time.

Why? because using method number 1 you must skip the implementation information for the caller. The caller must know at least how large the structure is. You cannot change the internal implementation of an object without recompiling the code that uses it.

+13
Jul 21 '10 at
source share

Why not provide both to get the best of both worlds?

Use the _init and _terminate functions to use method # 1 (or any other name you think is necessary).

Use the optional _create and _destroy functions for dynamic selection. Since _init and _terminate already exist, it effectively boils down to:

 myStruct *myStruct_create () { myStruct *s = malloc(sizeof(*s)); if (s) { myStruct_init(s); } return (s); } void myStruct_destroy (myStruct *s) { myStruct_terminate(s); free(s); } 

If you want it to be opaque, do _init and _terminate static and don't expose them in the API, just provide _create and _destroy. If you need other distributions, for example. with a given callback, provide a different set of functions for this, for example. _createcalled, _destroycalled.

It's important to keep track of the distribution, but you should still do it. You should always use the analogue of the dispenser used for freeing.

+11
Jul 21 '10 at 5:45
source share

Both are functionally equivalent. But, in my opinion, method number 2 is easier to use. Several reasons for choosing 2 over 1:

  • This is more intuitive. Why do I need to call free on an object after I (apparently) destroyed it with myStruct_Destroy .

  • Hides user information myStruct from the user. He does not need to worry about this size, etc.

  • In method # 2, myStruct_init does not need to worry about the initial state of the object.

  • You do not need to worry about a memory leak from a user who has forgotten to call free .

If your API implementation is shipped as a separate shared library, method # 2 is required. To isolate your module from any inconsistency in the malloc / new and free / delete implementations in the compiler versions, you must save the memory allocation and de-allocation for yourself. Please note: this is more like C ++ than C.

+4
Jul 21 2018-10-21T00:
source share

The problem that I encounter in the first method is not so much that it is more for the caller, but that the api is now violated by the handcuffs when expanding the amount of memory that it uses, precisely because it does not know how the received memory has been allocated. The caller does not always know in advance how much memory he will need (imagine if you are trying to implement a vector).

Another option that you did not mention, which in most cases would be excessive, is to pass a function pointer that the api uses as a distributor. This does not allow you to use the stack, but allows you to do something like replacing the use of malloc with a memory pool that still keeps the api in control when it wants to allocate.

As for which method is the correct api design, it was executed in both ways in the C. strdup () standard library and stdio uses the second method, while sprintf and strcat use the first method. Personally, I prefer the second method (or the third) if 1) I don’t know that I will never need realloc and 2) I expect that the life of my objects will be short, and thus the use of the stack is very convincing

edit: Actually there is another option, and it is bad, with a well-known precedent. You can do it like strtok () does it with statics. Not good, just mentioned for completeness.

+3
Jul 21 '10 at 4:59
source share

Both methods are fine, I usually do the first method, since most of the C I do for embedded systems, and all the memory is either tiny variables on the stack or statically distributed. Thus, there may not be a lack of memory, either you have enough at the beginning, or from the very beginning. It is useful to know when you have 2K Ram :-) So all my libraries are like # 1, where it is assumed that memory will be allocated.

But this is a regional case of developing C.

Having said that, I will probably go with No. 1. Perhaps using init and finalize / dispose (rather than destruction) for names.

+2
Jul 21 '10 at 5:12
source share

This may give some element of reflection:

case # 1 simulates a memory allocation scheme in C ++ with more or less the same benefits:

  • simple distribution of time series on the stack (either in static arrays or such as writing your own structure distributor replacing malloc).
  • easy to get rid of memory if something goes wrong in init

case # 2 hides more information about the structure used and can also be used for opaque structures, usually when the structure visible to the user is not quite the same as the internally used lib (say, there may be a few more fields hidden at the end of the structure).

A mixed API between case No. 1 and case No. 2 is also common: there is a field used to pass a pointer to some already initialized structure, if it is null, it is selected (and the pointer always returns). With such an API, free is usually the responsibility of the caller, even if init performed the allocation.

In most cases, I would probably go for case number 1.

+2
Jul 21 '10 at 5:17
source share

Both are acceptable - there are compromises between them, as you have already noted.

There are great examples of the real world of both - as Dean Harding says GTK + uses the second method; OpenSSL is an example that uses the first.

+1
Jul 21 '10 at 4:48
source share

I would go (1) with one simple extension, i.e. so that your _init function always returns a pointer to an object. After that, initializing the pointer can simply read:

 myStruct *s = myStruct_init(malloc(sizeof(myStruct))); 

As you can see the right side, then only a reference to the type, not to the variable anymore. A simple macro then gives you (2) at least partially

 #define NEW(T) (T ## _init(malloc(sizeof(T)))) 

and your pointer initialization reads

 myStruct *s = NEW(myStruct); 
+1
Jul 21 '10 at 6:13
source share

See your method # 2 says

 myStruct *s = myStruct_init(); myStruct_foo(s); myStruct_destroy(s); 

Now let's see if myStruct_init() needs to return some error code for various reasons, then it resolves this path.

 myStruct *s; int ret = myStruct_init(&s); // int myStruct_init(myStruct **s); myStruct_foo(s); myStruct_destroy(s); 
0
Dec 16 '15 at 10:17
source share



All Articles