Method 2 has a sequential bottleneck (single-threaded reading and output of work items). This will not change unlimitedly in accordance with the Amdal Act. This is a very fair and reliable method.
Method 1 has no bottleneck and will scale. Be sure not to cause random disk I / O. I would use a mutex to read only one stream at a time. Read in a large serial block, possibly 4-16MB. While the disk was searching for one head, it could read about 1 MB of data.
If parsing strings takes a considerable amount of time, you cannot use method 2 because of the large sequential part. It will not scale. If parsing is fast, use method 2 because it is easier to get.
To illustrate the concept of a bottleneck: Imagine 1,000,000 threads of computing asking for one stream of readers to give them strings. This reader stream will not be able to support line feeds as fast as they are required. You will not get 1 times more bandwidth. It will not scale. But if 1e6 streams are read regardless of a very fast I / O device, you will get throughput the 1st time because there is no bottleneck. (I used extreme numbers to make a point. The same idea applies to the small.)
source share