Does a separate cycle slow down an independent earlier cycle?

Question

Does a separate cycle slow down an independent earlier cycle?

How does a single cycle affect the performance of an independent earlier cycle?

My first loop reads a few large text files and counts lines / lines. After malloc, the second cycle fills the selected matrix.

If the second cycle is commented out, the first cycle takes 1.5 seconds. However, compiling with the second loop slows down the first loop, which now takes 30-40 seconds!

In other words: the second cycle somehow slows down the first cycle. I tried to change the scope, change the compilers, change the compiler flags, change the loop itself, bring everything to main (), use boost :: iostream and even put one loop in the shared library, but with every attempt the same problem persists!

The first cycle runs quickly until the program is compiled with the second cycle.

EDIT: Here is a complete example of my problem ------------>

#include <iostream> #include <vector> #include "string.h" #include "boost/chrono.hpp" #include "sys/mman.h" #include "sys/stat.h" #include "fcntl.h" #include <algorithm> unsigned long int countLines(char const *fname) { static const auto BUFFER_SIZE = 16*1024; int fd = open(fname, O_RDONLY); if(fd == -1) { std::cout << "Open Error" << std::endl; std::exit(EXIT_FAILURE); } posix_fadvise(fd, 0, 0, 1); char buf[BUFFER_SIZE + 1]; unsigned long int lines = 0; while(size_t bytes_read = read(fd, buf, BUFFER_SIZE)) { if(bytes_read == (size_t)-1) { std::cout << "Read Failed" << std::endl; std::exit(EXIT_FAILURE); } if (!bytes_read) break; int n; char *p; for(p = buf, n=bytes_read ; n > 0 && (p = (char*) memchr(p, '\n', n)) ; n = (buf+bytes_read) - ++p) ++lines; } close(fd); return lines; } int main(int argc, char *argv[]) { // initial variables int offset = 55; unsigned long int rows = 0; unsigned long int cols = 0; std::vector<unsigned long int> dbRows = {0, 0, 0}; std::vector<std::string> files = {"DATA/test/file1.csv", // large files: 3Gb "DATA/test/file2.csv", // each line is 55 chars long "DATA/test/file3.csv"}; // find each file number of rows for (int x = 0; x < files.size(); x++) { // <--- FIRST LOOP ** dbRows[x] = countLines(files[x].c_str()); } // define matrix row as being the largest row found // define matrix col as being 55 chars long for each csv file std::vector<unsigned long int>::iterator maxCount; maxCount = std::max_element(dbRows.begin(), dbRows.end()); rows = dbRows[std::distance(dbRows.begin(), maxCount)]; // typically rows = 72716067 cols = dbRows.size() * offset; // cols = 165 // malloc required space (11998151055) char *syncData = (char *)malloc(rows*cols*sizeof(char)); // fill up allocated memory with a test letter char t[]= "x"; for (unsigned long int x = 0; x < (rows*cols); x++) { // <--- SECOND LOOP ** syncData[x] = t[0]; } free(syncData); return 0; }

I also noticed that decreasing the number of columns speeds up the first loop.

The profiler points a finger at this line:

 while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))

The program is idle on this line for 30 seconds or the wait count is 230,000. In the assembly, the expectation count is found at:

 Block 5: lea 0x8(%rsp), %rsi mov %r12d, %edi mov $0x4000, %edx callq 0x402fc0 <------ stalls on callq Block 6: mov %rax, %rbx test %rbx, %rbx jz 0x404480 <Block 18>

I assume that the API block occurs when reading from the stream, but I do not know why?

+7

c ++ performance loops for-loop

Harry reed Oct 31 '16 at 21:02

source share

2 answers

FROM THIS PAGE :

"The close function closes the file descriptor files. Closing the file has the following consequences:

The file descriptor is freed. Any write locks belonging to the process in the file are unlocked. When all file descriptors associated with a channel or FIFO have been closed, any unread data is discarded. "

I think you could block resources on a previous read. Try closing the file and tell us the result.

0

Ricardo Ortega Magaña Oct 31 '16 at 22:59

source share

Peter Cordes · Accepted Answer · 2016-11-01T13:31:47+0000

My theory:

Allocating and touching all this memory forces large files out of the disk cache, so the next run should read them from disk.

If you run the version without cycle2 a couple of times to warm up the disk cache, then run the version with loop2, I will predict that it will be fast for the first time, but slow for subsequent starts, without reheating the disk ..

Memory consumption occurs after reading files. This causes “memory redundancy” in the page cache (for example, disk cache), as a result of which it dumps data from the cache to make room for pages for your process to write.

Your computer may have enough free RAM to cache your working set. Closing a web browser can free up enough to make a difference! Or not, since your 11998151055 is 11.1GiB, and you write every page. (Every byte, even. You can do it with memset for better performance, although I assume you only showed a dummy version)

By the way, another tool for exploring this would be time ./a.out . It can show you whether your program will spend all its CPU time in the "user space" and "kernel" ("system") modes.

If user + sys adds up to real time, your process is CPU related. If not, this is due to I / O, and your process blocks disk I / O (which is normal, since counting new lines should be fast).

Does a separate cycle slow down an independent earlier cycle?

I also noticed that decreasing the number of columns speeds up the first loop.

More articles: