Strtok and memory leaks

I wrote a simple URL parser using strtok (). here is the code

#include <stdio.h> #include <stdlib.h> typedef struct { char *protocol; char *host; int port; char *path; } aUrl; void parse_url(char *url, aUrl *ret) { printf("Parsing %s\n", url); char *tmp = (char *)_strdup(url); //char *protocol, *host, *port, *path; int len = 0; // protocol agora eh por exemplo http: ou https: ret->protocol = (char *) strtok(tmp, "/"); len = strlen(ret->protocol) + 2; ret->host = (char *) strtok(NULL, "/"); len += strlen(ret->host); //printf("char at %d => %c", len, url[len]); ret->path = (char *)_strdup(&url[len]); ret->path = (char *) strtok(ret->path, "#"); ret->protocol = (char *) strtok(ret->protocol, ":"); // host agora Γ© por exemplo address.com:8080 //tmp = (char *)_strdup(host); //strtok(tmp, ":"); ret->host = (char *) strtok(ret->host, ":"); tmp = (char *) strtok(NULL, ":"); if(tmp == NULL) { if(strcmp(ret->protocol, "http") == 0) { ret->port = 80; } else if(strcmp(ret->protocol, "https") == 0) { ret->port = 443; } } else { ret->port = atoi(tmp); } //host = (char *) strtok(NULL, "/"); } /* * */ int main(int argc, char** argv) { printf("hello moto\n"); aUrl myUrl; parse_url("http://teste.com/Teste/asdf#coisa", &myUrl); printf("protocol is %s\nhost is %s\nport is %d\npath is %s\n", myUrl.protocol, myUrl.host, myUrl.port, myUrl.path); return (EXIT_SUCCESS); } 

As you can see, I use strtok () a lot, so I can "slice" the URL. I don’t need to support URLs other than http or https, so the way to solve it solves all my problems. It bothers me (it works on the embedded device) - Am I losing memory? When I write something like

 ret->protocol = (char *) strtok(tmp, "/"); 

And then call

 ret->protocol = (char *) strtok(ret->protocol, ":"); 

Does my first protocol keep the ret-> protocol in memory? I thought that maybe I need to set the first call to the tmp pointer, call strtok, pointing ret-> protocol to the right of the line (second call), and then free (tmp).

What should be the best way to use strtok?

+4
source share
4 answers

To answer your question directly, strtok returns only a pointer to the location inside the line that you pass it as input - it does not allocate new memory for you, so you do not need to call any of the pointers for free, it returns you in return.

For what it's worth, you can also see "strchr" and "strstr", which are non-destructive ways to find single characters or sequences in strings.

Also note that memory allocation is problematic here - you use strdup () to allocate a new line inside your parsing function, and then you assign fragments of this memory block to the "ret" fields. This way, your caller will be responsible for freeing the strdup'd line, but since you just pass that line back implicitly inside ret, the caller must know magically which pointer should go for free. (Maybe ret-> protocol, but maybe not, depending on how the input looks.)

+19
source

strtok changes the string in place, replacing the specified characters with NULL. Since the lines in C are terminated with NULL, it now seems that your source pointer points to a shorter line, although the original line still exists and still occupies the same amount of memory (but with characters replaced by NULL). The end of the line, I think, contains double-NULL.

The short answer is: keep a pointer to the beginning of the string buffer and a pointer to the "current" pointer to the string when parsing it. When you use strtok or iterate over a string in other ways, you update the "current" pointer, but leave only the start pointer. When you are done, free () is the start pointer. No memory leak.

+5
source

Did you know that you can continue parsing a string using NULL as the first parameter to strtok?

First call:

 char* token = strtok(string, delimiters); 

Then:

 token = strtok(NULL, other_delimiters); 

This simplifies the code:

 int parse_url(char *url, aUrl *ret) { //get protocol char* token = strtok(url, "/"); if( token == NULL ) return -1; strcpy(ret->protocol, token); strcat(ret->protocol, "//"); // skip next '/' token = strtok(NULL, "/"); if( token == NULL ) return -1; //get host token = strtok(NULL, "/"); if( token == NULL ) return -1; strcpy(ret->host, token); // get path token = strtok(NULL, "#"); if( token == NULL ) return -1; strcpy(ret->path, token); // ... return 0; } 

You can see that I had a return value to see if the parsing was successful.

+3
source

Thanks for sharing your code! I ran it inside valgrind and fixed two memory leaks generated by strdup functions.

 #include <stdio.h> #include <stdlib.h> #include <string.h> typedef struct { char *protocol; char *host; int port; char *path; } URL; void parse_url(char *url, URL *ret) { char *tmp = (char *) strdup(url); int len = 0; ret->protocol = (char *) strtok(tmp, "/"); len = strlen(ret->protocol) + 2; ret->host = (char *) strtok(NULL, "/"); len += strlen(ret->host); ret->path = (char *) strdup(&url[len]); ret->path = (char *) strtok(ret->path, "#"); ret->protocol = (char *) strtok(ret->protocol, ":"); ret->host = (char *) strtok(ret->host, ":"); tmp = (char *) strtok(NULL, ":"); if (tmp == NULL) { if (strcmp(ret->protocol, "http") == 0) { ret->port = 80; } else if (strcmp(ret->protocol, "https") == 0) { ret->port = 443; } } else { ret->port = atoi(tmp); } } void free_url(URL *url) { free(url->path); free(url->protocol); } int main(int argc, char** argv) { URL url; parse_url("http://example.com:3000/Teste/asdf#coisa", &url); printf("protocol: %s\nhost: %s\nport: %d\npath: %s\n", url.protocol, url.host, url.port, url.path); free_url(&url); return (EXIT_SUCCESS); } 
+1
source

All Articles