Multiple threads read from the same file

My platform is Windows Vista 32 with visual C ++ express 2008.

eg:

If I have a file containing 4000 bytes, can I have 4 threads read from the file at the same time? and each thread accesses a different section of the file.

stream 1 is read 0-999, stream 2 is read 1000 - 2999, etc.

please provide an example in C.

+10
c ++ c file-io
source share
13 answers

If you do not write to them, you do not need to worry about the state of synchronization / race.

Just open the shared reading file as different descriptors and everything will work. (i.e. you should open the file in the context of the stream, and not use the same file descriptor).

#include <stdio.h> #include <windows.h> DWORD WINAPI mythread(LPVOID param) { int i = (int) param; BYTE buf[1000]; DWORD numread; HANDLE h = CreateFile("c:\\test.txt", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL); SetFilePointer(h, i * 1000, NULL, FILE_BEGIN); ReadFile(h, buf, sizeof(buf), &numread, NULL); printf("buf[%d]: %02X %02X %02X\n", i+1, buf[0], buf[1], buf[2]); return 0; } int main() { int i; HANDLE h[4]; for (i = 0; i < 4; i++) h[i] = CreateThread(NULL, 0, mythread, (LPVOID)i, 0, NULL); // for (i = 0; i < 4; i++) WaitForSingleObject(h[i], INFINITE); WaitForMultipleObjects(4, h, TRUE, INFINITE); return 0; } 
+20
source share

There is not even a big problem with writing to the same file, to be honest.

The easiest way is to simply display the memory card. Then the OS will give you a void * where the file is mapped to memory. Add this to char [] and make sure that each thread uses non-overlapping subarrays.

 void foo(char* begin, char*end) { /* .... */ } void* base_address = myOS_memory_map("example.binary"); myOS_start_thread(&foo, (char*)base_address, (char*)base_address + 1000); myOS_start_thread(&foo, (char*)base_address+1000, (char*)base_address + 2000); myOS_start_thread(&foo, (char*)base_address+2000, (char*)base_address + 3000); 
+4
source share

Of course, you can have multiple read streams from the data structure, race conditions can potentially arise if any write occurs.

To avoid such race conditions, you need to define the boundaries that streams can read if you have an explicit number of data segments and an explicit number of streams to match them, which is easy.

As for the C example, you will need to provide additional information, such as the thread library that you are using. Try it first, then we can help you fix any problems.

+2
source share

I do not see a real advantage for this.
You may have several read streams from the device, but your bottleneck will not be the CPU, but rather the disk I / O speed.

If you are not careful, you can even slow down the processes (but you will need to measure them in order to know for sure).

+2
source share

Windows supports overlapping I / O, which allows a single thread to queue multiple I / O requests asynchronously to improve performance. Perhaps this can be used by several streams at the same time, as long as the file you are accessing supports search (i.e. this is not a channel).

Passing FILE_FLAG_OVERLAPPED to CreateFile() allows simultaneous reading and writing to the same file descriptor; otherwise, Windows will serialize them. Specify the file offset using the Offset and OffsetHigh elements of the OffsetHigh structure.

For more information, see Synchronizing and Overlapping Input and Output .

+2
source share

The easiest way is to open the file in each parallel instance, but just open it as read-only.

People who say there may be an I / O bottleneck are probably wrong. All modern caching files are operating system files. This means that the first time you read a file, it will be the slowest, and any subsequent reads will be lightning fast. A file of 4000 bytes may even be located inside the processor cache.

+1
source share

You do not need to do anything particularly smart if all they do is read. Obviously, you can read it as many times as you want, unless you block it. Writing is definitely another matter, of course ...

I need to wonder why you would like to, although this will most likely work poorly, since your hard drive will spend a lot of time searching back and forth, rather than reading all of this in one (relatively) continuous scan. For small files (like your example with 4000 lines) where this might not be such a problem, this does not seem to be a problem.

0
source share

Perhaps, although I'm not sure that it will be worth the effort. Have you considered reading the entire file in memory in one thread, and then allowing multiple threads to access this data?

0
source share

Reading: no need to lock the file. Just open the file as read or read.

Writing: Use the mutex to ensure that the file is written by only one person.

0
source share

As others have already noted, there is no problem with having multiple threads read from the same file if they have their own file descriptor / descriptor. However, I am a little interested in your motives . Why do you want to read the file in parallel? If you are only reading a file in memory, your bottleneck is most likely the disk itself, and in this case several threads will not help you (it just clutters your code).

And as always during optimization, you should not try to do this until you can (1) easily understand, work, solve, and (2) you measured your code to know where you should optimize.

0
source share
 std::mutex mtx; void worker(int n) { mtx.lock(); char * memblock; ifstream file ("D:\\test.txt", ios::in); if (file.is_open()) { memblock = new char [1000]; file.seekg (n * 999, ios::beg); file.read (memblock, 999); memblock[999] = '\0'; cout << memblock << endl; file.close(); delete[] memblock; } else cout << "Unable to open file"; mtx.unlock(); } int main() { vector<std::thread> vec; for(int i=0; i < 3; i++) { vec.push_back(std::thread(&worker,i)); } std::for_each(vec.begin(), vec.end(), [](std::thread& th) { th.join(); }); return 0; } 
0
source share

You need a way to synchronize these threads. There are various solutions for mutex http://en.wikipedia.org/wiki/Mutual_exclusion

-one
source share

He wants to read from a file in different streams. I think this should be fine if the file is opened as read-only on each stream.

I hope you do not want to do this for performance because you have to scan large parts of the file for newlines in each stream.

-one
source share

All Articles