Optimize the read cycle () C (two loops in one)

I need to read files and save them in mainbuff and mainbuff2.

I need to use only syscalls like open() , read() , write() , etc.

I do not want to store them on the stack, what if it is very large? Heap distribution is better.

this code works:

 ... char charbuf; char *mainbuff1=malloc(100); char *mainbuff2=malloc(100); while (read(file1, &charbuf, 1)!=0) mainbuff1[len++]=charbuf; while (read(file2, &charbuf, 1)!=0) mainbuff2[len2++]=charbuf; ... 

But mainbuff is only 100 characters. The best solution is mainbuff alloc after counting the characters in the file as follows:

 ... char charbuf; while (read(file1, &charbuf, 1)!=0) len++; while (read(file2, &charbuf, 1)!=0) len2++; char *mainbuff1=malloc(len); char *mainbuff2=malloc(len2); ... 

and then repeat the while and read the bytes in mainbuff.

But 2 cycles (the first will read and read, and the second will read) will be inefficient and slow for large files. You need to do this in one or more efficient ways. Please help! I do not know!

+4
source share
7 answers

You can use fstat to get the file size instead of two times.

 #include <sys/stat.h> int main() { struct stat sbuf; int fd = open("filename", O_RDWR); fstat(fd, &sbuf); char *buf = malloc(sbuf.st_size + 1); } 

But, really, the time to worry about efficiency is that it works too slowly.

+7
source

If this is really a place where optimizations are needed, then what you really need to optimize is the following two things:

  • buffer allocation
  • number of calls to read() and write()

For small buffers from 100 to 1000 bytes, there is no reason to use malloc() and the like, just allocate a buffer on the stack, this will be the fastest. Unless, of course, you want to return pointers to these buffers from a function, in this case you should probably use malloc() . Otherwise, you should use global / static arrays instead of dynamically allocated ones.

For I / O calls, call read() and write() with the entire buffer size. Do not call them to read or write single bytes. Going to the core and back is really worth it.

Also, if you plan on working with fairly large files in RAM, consider using file association.

+5
source

stat et al. allow you to get the file size. http://linux.die.net/man/2/fstat

Or, if you cannot use this, lseek http://linux.die.net/man/2/lseek (pay particular attention to the return value)

If you cannot use this, you can always realloc use your buffer.

I leave this for you to implement it, since this is obviously the purpose .;)

+4
source

Before you optimize anything, you must profile your code . Many tools are available for this:

  • Valgrind
  • Intel VTune
  • Aqtime
  • AMD CodeAnalyst
+2
source

define an array that automatically simplifies the extension. like this

 #include <stdio.h> #include <stdlib.h> typedef struct dynarray { size_t size; size_t capacity; char *array; } DynArray; DynArray *da_make(size_t init_size){ DynArray *da; if(NULL==(da=(DynArray*)malloc(sizeof(DynArray)))){ perror("memory not enough"); exit(-1); } if(NULL==(da->array=(char*)malloc(sizeof(char)*init_size))){ perror("memory not enough"); exit(-1); } da->size = 0; da->capacity=init_size; return da; } void da_add(DynArray *da, char value){ da->array[da->size] = value; if(++da->size == da->capacity){ da->array=(char*)realloc(da->array, sizeof(char)*(da->capacity += 1024)); if(NULL==da){ perror("memory not enough"); exit(-1); } } } void da_free(DynArray *da){ free(da->array); free(da); } int main(void) { DynArray *da; char charbuf; int i; da = da_make(128); while(read(0, &charbuf, 1)!=0) da_add(da, charbuf); for(i=0;i<da->size;++i) putchar(da->array[i]); da_free(da); return 0; } 
+1
source

Why do you need everything in your memory? You may have chunks of reading, processing, reading the next fragment, etc.,
If you do not have enough memory, you cannot save everything in your buff. What is your goal?

0
source

If, as you say, you only use system calls, you can leave using the entire heap as a buffer.

 #include <unistd.h> #include <signal.h> #include <stdio.h> #include <sys/types.h> #include <fcntl.h> size_t sz; void fix(x){signal(SIGSEGV,fix);sbrk(sz *= 2);} int main() { sz = getpagesize(); signal(SIGSEGV,fix); char *buf = sbrk(sz); int fd = open("filename", O_RDWR); read(fd, buf, -1); } 

But if you call a library function that uses malloc, Kablooey!

The brk and sbrk give you direct access to the same heap that malloc uses. But without any of the overhead. And without any malloc features like free , realloc . sbrk is called with a size in bytes and returns void * . brk is called with the value of the pointer (i.e. you just imagine that the pointer exists and declares it to be brk in some way) and returns void * .

Using brk or sbrk to allocate memory, it uses the same space that malloc will try to install and use when it first calls malloc or realloc . And many library functions use malloc under the hood, so there are many ways to break this code. This is a very strange and interesting area.

The signal handler here is also very dangerous. This gives you automatic unlimited space, but, of course, if you encounter any other segmentation violation, such as dereferencing the NULL pointer, the handler cannot fix it, and it can no longer crash. Thus, this can send the program into an unpleasant cycle: retry access to memory, allocate more space, retry access to memory, allocate more space.

0
source

All Articles