Reading all contents from a text file - C

I am trying to read all the content from a text file. Here is the code I wrote.

#include <stdio.h> #include <stdlib.h> #define PAGE_SIZE 1024 static char *readcontent(const char *filename) { char *fcontent = NULL, c; int index = 0, pagenum = 1; FILE *fp; fp = fopen(filename, "r"); if(fp) { while((c = getc(fp)) != EOF) { if(!fcontent || index == PAGE_SIZE) { fcontent = (char*) realloc(fcontent, PAGE_SIZE * pagenum + 1); ++pagenum; } fcontent[index++] = c; } fcontent[index] = '\0'; fclose(fp); } return fcontent; } static void freecontent(char *content) { if(content) { free(content); content = NULL; } } 

This use

 int main(int argc, char **argv) { char *content; content = readcontent("filename.txt"); printf("File content : %s\n", content); fflush(stdout); freecontent(content); return 0; } 

Since I'm new to C, I wonder if this code looks perfect? Do you see any problems / improvements?

Used compiler: GCC. But this code is expected to be cross-platform.

Any help would be greatly appreciated.

Edit

Here is the updated code with fread and ftell .

 static char *readcontent(const char *filename) { char *fcontent = NULL; int fsize = 0; FILE *fp; fp = fopen(filename, "r"); if(fp) { fseek(fp, 0, SEEK_END); fsize = ftell(fp); rewind(fp); fcontent = (char*) malloc(sizeof(char) * fsize); fread(fcontent, 1, fsize, fp); fclose(fp); } return fcontent; } 

I am wondering what will be the relative complexity of this function?

+4
source share
6 answers

You should try exploring the functions fsize ( About fsize, see the update below ) and fread . This can be a huge performance improvement.

Use fsize to get the size of the file you are reading. Use this size to make only one memory address. ( About fsize, see the update below. The idea of ​​getting the file size and making one address remains the same ).

Use fread to read a block in a file. This is much faster than reading a file with a single file.

Something like that:

 long size = fsize(fp); fcontent = malloc(size); fread(fcontent, 1, size, fp); 

Update

Not sure if fsize is cross-platform, but you can use this method to get the file size:

 fseek(fp, 0, SEEK_END); size = ftell(fp); fseek(fp, 0, SEEK_SET); 
+7
source

People often realloc double their existing size to get amortized constant time instead of linear. This makes the buffer no more than twice as large, which is usually good, and after you finish, you can redistribute back to the desired size.

But even better for stat(2) for the file size and allocate it once (with some extra room if the file size is unstable).

Also, why don't you use fgets(3) instead of reading a character by character or, even better, mmap(2) whole thing (or the corresponding fragment if it is too large for memory).

+2
source

It is probably slower and, of course, more complex than:

 while((c = getc(fp)) != EOF) { putchar(c); } 

which does the same as your code.

+2
source

This is from a quick read, so I might have missed a few issues.

First, a = realloc(a, ...); wrong. If realloc() fails, it returns NULL , but does not free the original memory. Since you reassign to a , the original memory is lost (i.e. This is a memory leak). The correct way to do this is: tmp = realloc(a, ...); if (tmp) a = tmp; tmp = realloc(a, ...); if (tmp) a = tmp; etc.

Secondly, about determining the file size using fseek(fp, 0, SEEK_END); Please note that this may or may not work. If the file is not random access (e.g. stdin ), you will not be able to return to the beginning to read it. In addition, fseek() , followed by ftell() , may not produce meaningful results for binary files. And for text files, it may not give you the correct number of characters that can be read. This topic has useful information on comp.lang.c FAQ question 19.2 .

Also, in the source code, you do not set index to 0 if it is PAGESIZE , so if your file is longer than 2*PAGESIZE , you will overwrite the buffer.

Your freecontent() function:

 static void freecontent(char *content) { if(content) { free(content); content = NULL; } } 

is useless. It sets only a copy of content to NULL . It looks like you wrote the setzero function as follows:

 void setzero(int i) { i = 0; } 

A much better idea is to monitor the memory yourself and not release anything more or less than necessary.

You should not specify the return value of malloc() or realloc() in C, since a void * implicitly converted to any other type of object pointer in C.

Hope this helps.

+1
source

One problem that I see here is the index variable, which is not decreasing. Thus, the condition if(!fcontent || index == PAGE_SIZE) will be true only once. Therefore, I believe that the check should be like index%PAGE_SIZE == 0 instead of index == PAGE_SIZE .

+1
source

On POSIX systems (e.g. linux) you can get the same effect with the mmap system call, which displays your entire file in memory. It has the ability to map this copy of the file when writing, so you must overwrite your file if you change the buffer.

This will usually be much more efficient as you leave as much as you can into the system. No need to do realloc or the like.

In particular, if you are just reading, and several processes do this, at the same time there will be only one copy for the entire system.

0
source

All Articles