I am facing the problem of reading / writing files (in Gigs) line by line.
Reading many posts and forum sites (including the SO group), mmap was suggested as the fastest option for reading / writing files. However, when I implement my code using the readline and mmap methods, mmap is the slower of the two. This is true for both reading and writing. I tested files of ~ 600 MB in size.
My implementations are parsed line by line and then tokenized string. I will only provide file input.
Here is the getline implementation:
void two(char* path) { std::ios::sync_with_stdio(false); ifstream pFile(path); string mystring; if (pFile.is_open()) { while (getline(pFile,mystring)) {
and here is mmap :
void four(char* path) { int fd; char *map; char *FILEPATH = path; unsigned long FILESIZE; // find file size FILE* fp = fopen(FILEPATH, "r"); fseek(fp, 0, SEEK_END); FILESIZE = ftell(fp); fseek(fp, 0, SEEK_SET); fclose(fp); fd = open(FILEPATH, O_RDONLY); map = (char *) mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0); /* Read the file char-by-char from the mmap */ char c; stringstream ss; for (long i = 0; i <= FILESIZE; ++i) { c = map[i]; if (c != '\n') { ss << c; } else { // c style tokenizing ss.str(""); } } if (munmap(map, FILESIZE) == -1) perror("Error un-mmapping the file"); close(fd); }
I missed a lot of error checking for brevity.
Is my mmap implementation incorrect and thus affects performance? Maybe mmap is not ideal for my application?
Thanks for any comments or help!
c ++ file-io getline mmap
Ian
source share