How does a single cycle affect the performance of an independent earlier cycle?
My first loop reads a few large text files and counts lines / lines. After malloc, the second cycle fills the selected matrix.
If the second cycle is commented out, the first cycle takes 1.5 seconds. However, compiling with the second loop slows down the first loop, which now takes 30-40 seconds!
In other words: the second cycle somehow slows down the first cycle. I tried to change the scope, change the compilers, change the compiler flags, change the loop itself, bring everything to main (), use boost :: iostream and even put one loop in the shared library, but with every attempt the same problem persists!
The first cycle runs quickly until the program is compiled with the second cycle.
EDIT: Here is a complete example of my problem ------------>
#include <iostream> #include <vector> #include "string.h" #include "boost/chrono.hpp" #include "sys/mman.h" #include "sys/stat.h" #include "fcntl.h" #include <algorithm> unsigned long int countLines(char const *fname) { static const auto BUFFER_SIZE = 16*1024; int fd = open(fname, O_RDONLY); if(fd == -1) { std::cout << "Open Error" << std::endl; std::exit(EXIT_FAILURE); } posix_fadvise(fd, 0, 0, 1); char buf[BUFFER_SIZE + 1]; unsigned long int lines = 0; while(size_t bytes_read = read(fd, buf, BUFFER_SIZE)) { if(bytes_read == (size_t)-1) { std::cout << "Read Failed" << std::endl; std::exit(EXIT_FAILURE); } if (!bytes_read) break; int n; char *p; for(p = buf, n=bytes_read ; n > 0 && (p = (char*) memchr(p, '\n', n)) ; n = (buf+bytes_read) - ++p) ++lines; } close(fd); return lines; } int main(int argc, char *argv[]) { // initial variables int offset = 55; unsigned long int rows = 0; unsigned long int cols = 0; std::vector<unsigned long int> dbRows = {0, 0, 0}; std::vector<std::string> files = {"DATA/test/file1.csv", // large files: 3Gb "DATA/test/file2.csv", // each line is 55 chars long "DATA/test/file3.csv"}; // find each file number of rows for (int x = 0; x < files.size(); x++) { // <--- FIRST LOOP ** dbRows[x] = countLines(files[x].c_str()); } // define matrix row as being the largest row found // define matrix col as being 55 chars long for each csv file std::vector<unsigned long int>::iterator maxCount; maxCount = std::max_element(dbRows.begin(), dbRows.end()); rows = dbRows[std::distance(dbRows.begin(), maxCount)]; // typically rows = 72716067 cols = dbRows.size() * offset; // cols = 165 // malloc required space (11998151055) char *syncData = (char *)malloc(rows*cols*sizeof(char)); // fill up allocated memory with a test letter char t[]= "x"; for (unsigned long int x = 0; x < (rows*cols); x++) { // <--- SECOND LOOP ** syncData[x] = t[0]; } free(syncData); return 0; }
I also noticed that decreasing the number of columns speeds up the first loop.
The profiler points a finger at this line:
while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))
The program is idle on this line for 30 seconds or the wait count is 230,000. In the assembly, the expectation count is found at:
Block 5: lea 0x8(%rsp), %rsi mov %r12d, %edi mov $0x4000, %edx callq 0x402fc0 <------ stalls on callq Block 6: mov %rax, %rbx test %rbx, %rbx jz 0x404480 <Block 18>
I assume that the API block occurs when reading from the stream, but I do not know why?