Best practice for returning string of variable length in c

I have a string function that takes a pointer to the source string and returns a pointer to the destination string. This feature is currently working, but I worry that I am not following the best practices governing malloc, realloc and free.

What differs from my function is that the length of the destination line does not match the original line, so realloc () needs to be called inside my function. I know, looking at the documents ...

http://www.cplusplus.com/reference/cstdlib/realloc/

that the memory address may change after realloc. This means that I cannot "pass by reference", as a C programmer for other functions, I need to return a new pointer.

So, the prototype of my function:

//decode a uri encoded string char *net_uri_to_text(char *); 

I don’t like the way I do it, because I have to free the pointer after the function starts:

 char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc"); printf("%s\n", chr_output); //testing123Z[\abc free(chr_output); 

This means that malloc () and realloc () are called inside my function, and the free () function is called outside my function.

I have a background in high-level languages ​​(perl, plpgsql, bash), so my instinct is the proper encapsulation of such things, but this may not be the best practice in C.

Question: Is my method the best practice, or is there a better way I should follow?

full example

Compiles and runs with two warnings about unused arguments argc and argv, you can safely ignore these two warnings.

example.c:

 #include <stdio.h> #include <string.h> #include <stdlib.h> char *net_uri_to_text(char *); int main(int argc, char ** argv) { char * chr_input = "testing123%5a%5b%5cabc"; char * chr_output = net_uri_to_text(chr_input); printf("%s\n", chr_output); free(chr_output); return 0; } //decodes uri-encoded string //send pointer to source string //return pointer to destination string //WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK! char *net_uri_to_text(char * chr_input) { //define variables int int_length = strlen(chr_input); int int_new_length = int_length; char * chr_output = malloc(int_length); char * chr_output_working = chr_output; char * chr_input_working = chr_input; int int_output_working = 0; unsigned int uint_hex_working; //while not a null byte while(*chr_input_working != '\0') { //if % if (*chr_input_working == *"%") { //then put correct char in sscanf(chr_input_working + 1, "%02x", &uint_hex_working); *chr_output_working = (char)uint_hex_working; //printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working); //realloc chr_input_working++; chr_input_working++; int_new_length -= 2; chr_output = realloc(chr_output, int_new_length); //output working must be the new pointer plys how many chars we've done chr_output_working = chr_output + int_output_working; } else { //put char in *chr_output_working = *chr_input_working; } //increment pointers and number of chars in output working chr_input_working++; chr_output_working++; int_output_working++; } //last null byte *chr_output_working = '\0'; return chr_output; } 
+8
c function string
source share
6 answers

It is quite normal to return malloc'd buffers from functions in C if you document the fact that they do it. Many libraries do this, although no function in the standard library does.

If you can calculate (not too pessimistic upper bounds) the number of characters that need to be written to the buffer cheaply, you can offer a function that does this and let the user call it.

It is also possible, but much less convenient, to accept a buffer that needs to be filled; I have seen quite a few libraries that do this:

 /* * Decodes uri-encoded string encoded into buf of length len (including NUL). * Returns the number of characters written. If that number is less than len, * nothing is written and you should try again with a larger buffer. */ size_t net_uri_to_text(char const *encoded, char *buf, size_t len) { size_t space_needed = 0; while (decoding_needs_to_be_done()) { // decode characters, but only write them to buf // if it wouldn't overflow; // increment space_needed regardless } return space_needed; } 

Now the caller is responsible for the selection and will do something like

 size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH; char *result = xmalloc(len); len = net_uri_to_text(input, result, len); if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) { // try again result = xrealloc(input, result, len); } 

(Here xmalloc and xrealloc are "safe" function allocations that I made to skip NULL checks.)

+8
source share

It is okay to return new- malloc -ed values ​​(and possibly internally realloc ed) from functions, you just need to document that you are doing this (as you are doing here).

Other obvious elements:

  • Instead of int int_length you can use size_t . This is an "unsigned type" (usually an unsigned int or unsigned long ), which is a suitable type for string lengths and malloc arguments.
  • You need to first allocate n + 1 bytes, where n is the length of the string, since strlen does not include the terminating 0 bytes.
  • You should check for the absence of malloc (return NULL ). If your function passes an error, write it down in the function description comment.
  • sscanf pretty heavy for converting two six byte bytes. It’s wrong, except that you don’t check if the conversion is performed (what if the input is incorrect? You can of course decide that this is the problem of the caller, but in general you can handle it). You can use isxdigit from <ctype.h> to check for hexadecimal digits and / or strtoul for conversion.
  • Instead of doing one realloc for each % conversion, you might want to do the final “realloc compression” if desired. Note that if you allocate (say) 50 bytes for a string and find that it only takes 49, including the final 0 bytes, you might not want to realloc in the end.
+2
source share

The fact is that C is low enough to force the programmer to properly manage memory. In particular, there is nothing wrong with returning a malloc() string. This is a common idiom that returns mallocated obejcts and has the caller free() them.

In general, if you do not like this approach, you can always take a pointer to a string and change it from within the function (after the last use, it should still be free() d, though).

One thing, however, that I don’t think is necessary, is clearly cutting the string. If the new line is shorter than the old, there obviously is enough space for it in the memory chunk of the old line, so you don't need realloc() .

(Besides the fact that you forgot to allocate one extra byte for the final NUL character, of course ...)

And, as always, you can simply return a different pointer each time the function is called, and you don’t need to call realloc() .

If you take one last good tip: it is recommended to const -qualify your input lines so that the caller can guarantee that you will not change them. Using this approach, you can safely call a function on string literals, for example.

In general, I would rewrite your function as follows:

 char *unescape(const char *s) { size_t l = strlen(s); char *p = malloc(l + 1), *r = p; while (*s) { if (*s == '%') { char buf[3] = { s[1], s[2], 0 }; *p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf() s += 3; } else { *p++ = *s++; } } *p = 0; return r; } 

And name it as follows:

 int main() { const char *in = "testing123%5a%5b%5cabc"; char *out = unescape(in); printf("%s\n", out); free(out); return 0; } 
+2
source share

I would approach the problem a little differently. Personally, I would split your function into two parts. The first function is to calculate the size required for malloc. The second will write the output string to the specified pointer (which was allocated outside the function). This saves multiple calls for realloc and will keep the complexity the same. Possible function for finding the size of a new line:

 int getNewSize (char *string) { char *i = string; int size = 0, percent = 0; for (i, size; *i != '\0'; i++, size++) { if (*i == '%') percent++; } return size - percent * 2; } 

However, as mentioned in other answers, there is no problem returning the malloc'ed buffer if you document it!

0
source share

In addition to what has already been mentioned in other posts, you should also document the fact that the string is redistributed. If your code is called with a static line or a line allocated with alloca , you cannot redistribute it.

0
source share

I think you are right to worry about the separation of mallocs and frees. As a rule, no matter what he does, he owns and must free him.

In this case, when the strings are relatively small, one good procedure is to make the string buffer larger than any possible string that it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10,000 characters, you can save any possible URL.

Another trick is to keep both the length and power of the string in front, so that (int)*mystring == length of string and (int)*(mystring + 4) == capacity string. Thus, the line itself begins only at the 8th position *(mystring+8) . By doing this, you can pass one pointer to a string and always know how long it is and how much memory the string has. You can create macros that automatically generate these offsets and make beautiful code.

The value of using buffers in this way is that you do not need to do reallocation. The new value overwrites the old value and you update the length at the beginning of the line.

0
source share

All Articles