C string handling practice

I am starting a new project in simple C (c99), which will work mainly with text. Due to the external limitations of the project, this code must be extremely simple and compact, consisting of a single source code file without external dependencies or libraries, except libc and similar ubiquitous system libraries.

With this understanding, what are some best practices, gotchas, tricks, or other methods that can help make line-based project processing more reliable and safe?

+20
c string security robustness
Jan 02 '10 at
source share
7 answers

Without additional information on what your code does, I would recommend developing all of your interfaces as follows:

size_t foobar(char *dest, size_t buf_size, /* operands here */) 

with semantics like snprintf :

  • dest points to a buffer of at least buf_size .
  • If buf_size is zero, null / invalid pointers are valid for dest and nothing will be written.
  • If buf_size nonzero, dest always null terminated.
  • Each foobar function returns the length of the full, un truncated output; the output was truncated if buf_size less than or equal to the return value.

Thus, when the caller can easily find out the required size of the destination buffer, a sufficiently large buffer can be obtained in advance. If the caller cannot easily find out, he can call the function once with a null argument for buf_size or with a buffer that is "probably large enough" and only retry if you run out of free space.

You can also make a wrapped version of such calls, similar to the GNU asprintf function, but if you want your code to be as flexible as possible, I would avoid making any distribution in actual string functions. Handling the probability of a failure is always simpler at the caller level, and many subscribers can guarantee that a failure will never be possible using a local buffer or a buffer that was received much earlier in the program so that the success or failure of a larger operation is atomic (which greatly simplifies processing errors).

+30
Jan 2 '10 at 21:01
source share

Some thoughts from a longtime embedded developer, most of which detail your requirement for simplicity and are not C-specific:

  • Determine which string processing functions you need and keep them as small as possible to minimize points of failure.

  • Follow the recommendations of R. to define a clear interface that is consistent across all line handlers. A strict, small, but detailed set of rules allows you to use pattern matching as a debugging tool: you may be suspicious of any code that is different from the rest.

  • As Bart van Ingen Schoenau noted, track the length of the buffer regardless of the length of the string. If you will always work with text, it is safe to use a standard null character to indicate the end of the line, but you need to make sure that the text + null will be buffered.

  • Ensure consistent behavior for all line handlers, especially where there are no standard functions: truncation, zero inputs, zero-termination, append, etc.

  • If you absolutely must violate any of your rules, create a separate function for this purpose and name it accordingly. In other words, give each function a single-valued behavior. That way you can use str_copy_and_pad() for a function that always fills its target with zeros.

  • memmove() possible, use safe built-in functions (e.g. memmove() for Jonathan Leffler) to make a heavy climb. But check them out to make sure they do what you think they do!

  • Check for errors as soon as possible. Unallocated buffer overflows can lead to ricochet errors, which are known to be difficult to find.

  • Write tests for each function to make sure that it satisfies its contract. Be sure to close the boundary cases (turn off by 1, empty / empty lines, overlapping source / place, etc.). And this may seem obvious, but make sure you understand how to create and detect buffer overflows / overflows, and then write tests that explicitly generate and test these problems. (My QA people are probably tired of my instructions to β€œnot just check to make sure this works, make sure it doesn't break.”)

Here are some methods that worked for me:

  • Create wrappers for your memory management routines that allocate "pick bytes" at each end of your buffers during allocations and check them when they are freed. You can also check them in your string handlers, perhaps when the STR_DEBUG macro is set. Caution: you need to thoroughly test your diagnostics so that they do not create additional points of failure.

  • Create a data structure that encapsulates both the buffer and its length. (It may also contain fence bytes if you use them.) Caution: you now have a non-standard data structure that your entire code base should manage, which could mean substantial rewriting (and therefore additional points of failure).

  • Have your line handlers check their inputs. If the function forbids null pointers, check them explicitly. If this requires a valid string (e.g. strlen() should) and you know the length of the buffer, make sure the buffer contains a null character. In other words, check out any assumptions you can make about code or data.

  • Write your tests first. This will help you understand each functional contract - exactly what it expects from the caller and what the caller should expect from it. You will think about how you will use it, about how it can break, and about the cases that it needs to handle.

Thanks so much for asking this question! I want developers to think about these issues, especially before I start coding. Good luck and best wishes for a reliable, successful product!

+10
Jan 03 2018-11-14T00:
source share

See strlcpy and strlcat ; see original paper for details.

+7
Jan 2 '10 at 20:43
source share

Two cents:

  • Always use the "n" version of string functions: strncpy, strncmp, (or wcsncpy, wcsncmp, etc.).
  • Always highlight using idiom +1: for example. char * str [MAX_STR_SIZE + 1] and then pass MAX_STR_SIZE as the size for the "n" version of the string functions and terminate with str [MAX_STR_SIZE] = '\ 0'; to make sure all lines are correctly finalized.

The last step is important because the "n" version of string functions will not add "\ 0" after copying if the maximum size is reached.

+2
Jan 02 '10 at 20:26
source share
  • Work with arrays on the stack whenever possible, and correctly initialize them. You do not need to track distributions, sizes and initializations.

     char myCopy[] = { "the interesting string" }; 
  • For medium-sized strings, the C99 has a VLA. They are a little less useful since you cannot initialize them. But you still have the first two of the above Benefits.

     char myBuffer[n]; myBuffer[0] = '\0'; 
0
Jan 02 '10 at
source share

Some important errors:

  • In C, there is no relation between the length of the string and the size of the buffer. The string always starts (and includes) the first '\0' -character. You are responsible for the programmer to make sure that this character can be found in the reserved buffer for this line.
  • Always explicitly track buffer sizes. The compiler tracks the size of the array, but this information will be lost for you before you know it.
0
Jan 03 '11 at 12:56
source share

When it comes to time and space, be sure to select a standard cue ball from here.

During my early firmware projects, I used lookup tables to count the bit set in the efficiency of the O (1) operation.

0
Aug 17 '12 at 11:00
source share



All Articles