Parallel I / O - Why Does It Work?

I have a python function that reads a string from a text file and writes it to another text file. He repeats this for every line in the file. Essentially:

Read line 1 -> Write line 1 -> Read line 2 -> Write line 2... 

And so on.

I can parallelize this process using the data queue, so this is more like:

 Read line 1 -> Read line 2 -> Read line 3... Write line 1 -> Write line 2.... 

My question is: why does this work (as in why I get acceleration?). It seems like a silly question, but I thought: can my hard drive really do only one thing? So why is one process not paused and the other completed?

Such things are hidden from the user when writing in a high-level language. I would like to know what is going on low?

+6
source share
1 answer

In short: I / O buffering. Two levels, even.

First, Python itself has I / O buffers. Thus, when you write all of these lines to a file, Python does not necessarily invoke syscall write right away - it does this when it flushes its buffers, which can be at any time when you invoke an entry before closing the file. It obviously will not affect you if you write at a level like you do syscalls.

But by contrast, the operating system will also use buffers. They work the same way - you do syscall to write to disk, the OS puts the data in its write buffer and will use it when other processes read this file. But it does not necessarily write it to disk - it can wait, theoretically, until you unmount this file system (possibly when it shuts down). This (part), why it might be a bad idea to disconnect a USB storage device without unmounting or “safely deleting”, for example, what you wrote on it, is not yet physically located on the device. Everything that the OS does does not depend on what language you write, or what part of the wrapper around your system calls.

In addition, both Python and the OS can do read buffering - essentially, when you read one line from a file, Python / OS expects you to be interested in the next few lines, and therefore reads them into main memory to avoid having to defer all the way to the disk later.

+7
source

All Articles