Reading a file using the POSIX API

Consider the following code snippet to read the contents of a file into a buffer

#include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #define BLOCK_SIZE 4096 int main() { int fd=-1; ssize_t bytes_read=-1; int i=0; char buff[50]; //Arbitary size for the buffer?? How to optimise. //Dynamic allocation is a choice but what is the //right way to relate the file size to bufffer size. fd=open("./file-to-buff.txt",O_RDONLY); if(-1 == fd) { perror("Open Failed"); return 1; } while((bytes_read=read(fd,buff,BLOCK_SIZE))>0) { printf("bytes_read=%d\n",bytes_read); } //Test to characters read from the file to buffer.The file contains "Hello" while(buff[i]!='\0') { printf("buff[%d]=%d\n",i,buff[i]); i++; //buff[5]=\n-How? } //buff[6]=`\0`-How? close(fd); return 0; } 

Code Description:

  • The input file contains the string "Hello"
  • This content must be copied to the clipboard.
  • The goal is achieved using the open and read POSIX API.
  • The read API uses a * arbitary size * buffer pointer to copy data.

Questions:

  • Dynamic allocation is a method that should be used to optimize buffer size. What is the correct procedure to bind / infer the size of the buffer from the size of the input file?
  • At the end of the read operation, I see that the reading has copied the new line character and NULL new line character in addition to the Hello characters. Please provide details on this behavior.

Output result

bytes_read = 6

positive effect [0] = N

positive effect [1] = e

positive effect [2] = L

positive effect [3] = L

positive effect [4] = o

positive effect [5] =

PS: the input file is a user-created file not created by the program (using the write API). Just mention here if that matters.

+6
source share
4 answers

Since you want to read the entire file, the best way is to make a buffer the size of the file size. It makes no sense to resize the buffer when you go. It just harms performance for no good reason.

File size can be obtained in several ways. The quick and dirty way is to lseek() at the end of the file:

 // Get size. off_t size = lseek(fd, 0, SEEK_END); // You should check for an error return in real code // Seek back to the beginning. lseek(fd, 0, SEEK_SET); // Allocate enough to hold the whole contents plus a '\0' char. char *buff = malloc(size + 1); 

Another way is to get information using fstat() :

 struct stat fileStat; fstat(fd, &fileStat); // Don't forget to check for an error return in real code // Allocate enough to hold the whole contents plus a '\0' char. char *buff = malloc(fileStat.st_size + 1); 

To get all the necessary types and prototypes of functions, make sure you include the right header:

 #include <sys/stat.h> // For fstat() #include <unistd.h> // For lseek() 

Note that read() does not automatically end data with \0 . You need to do this manually, so we allocate an extra character (size + 1) for the buffer. The reason you already have the \0 character in your case is a random case.

Of course, since buf now a dynamically allocated array, remember to free it again when you no longer need it:

 free(buff); 

Remember that allocating a buffer the size of the file you want to read can be dangerous. Imagine that (by mistake or as intended, it does not matter) the file is several GB in size. For such cases, it is useful to have the maximum size allowed. However, if you do not want to use such restrictions, you should switch to another way of reading from files: mmap() . Using mmap() you can map parts of a file into memory. Thus, it doesn't matter how big the file is, since you can only work in parts, except that you control memory usage.

+6
source

1, you can get the size of the file with the status (filename, & stat), but determining the buffer size for the size is just fine

2, firstly, after "Hello" there is no NULL character, it must be accidental that the stack area that you allocated was 0 before your code was executed, see chapter 7.6 APUE. In fact, you must initialize a local variable before using it.

I tried to generate a text file using vim, emacs and echo -n Hello> file-to-buff.txt, only vim automatically adds an automatic line break

+3
source

You can consider dynamic buffer allocation by first creating a fixed-size buffer using malloc and double (with realloc ) the size when it is full. It would have good temporal complexity and space.

You are currently re-reading into the same buffer. You must increase the point in the buffer after each read, otherwise you will overwrite the contents of the buffer in the next section of the file.

The code you supply allocates 50 bytes for the buffer, but you pass 4096 as the size to read . This can lead to a buffer overflow for any 50 byte files.

Regarding `\ n 'and' \ 0 '. Perhaps the new line is in the file, and "\ 0" is already in the buffer. The buffer is allocated on the stack in your code, and if this section of the stack has not yet been used, it will probably contain zeros placed there by the operating system when loading your program.

The operating system does not try to stop reading data from a file, it can be binary data or a character set that it does not understand. Line completion, if necessary, is up to you.

A few other points that are more related to style:

  • You can use a for (i = 0; buff[i]; ++i) loop for (i = 0; buff[i]; ++i) instead of the time to print at the end. That way, if anyone messes up the index variable i , you will not be affected.
  • You can close the file earlier, after you finish reading it, so as not to open the file for a long period of time (and perhaps forget to close it if some kind of error occurs).
+2
source

For your second read question, do not automatically add the character '\0' . If you think your file is a text file, your should add '\0' after the read call to indicate the end of the line.

In C, the end of a line is represented by this character. If read set of 4 characters, printf will read these 4 characters, and try the 5th character: if it is not '\0' , it will continue to print until the next '\0' . It is also a source of buffer overflows.

For '\n' it is probably in the input file.

+1
source

All Articles