How can I simultaneously download and convert a binary file using streams?

I have a program that downloads a binary file from another PC.
I also have another standalone program that can convert this binary to readable CSV.

I would like to bring the conversion tool to the download tool by creating a stream in the downloadable tool that runs the conversion code (so that it can start converting at boot time, reducing the total download and conversion time independently of each other).

I believe that I can successfully start another stream, but how to synchronize the conversion stream with the main load?

i.e. The conversion ends with the download, you need to wait for more downloads, then start the conversion again, etc.

Does this look like synchronization of multiple threads ? If so, does the uploaded binary have to be the resource accessed by semaphores?

Am I on the right track or should I be heading in a different direction before I start?

Any advice is appreciated.

Thanks.

+4
source share
3 answers

This is a classic case of a producer-consumer problem with download thread as producer and conversion thread as consumer .

Google and you will find an implementation for your language of choice. Here are some of the MSDNs: How to implement various manufacturer-consumer patterns .

+3
source

Integration of the download into the file, you must write the downloaded data to the channel. The return stream can be read from the channel and then write the converted output to a file. This will automatically sync them.

If you need the source file, as well as the converted one, just upload the stream, write the data to a file, and then write the same data to the channel.

+2
source

Yes, to protect access to data, you undoubtedly need semaphores (or something similar, for example, an event or a critical section).

My immediate reaction would be to think primarily about the sequence of blocks, but not about the complete file. Secondly, I almost never use a semaphore (or something like that) directly. Instead, I would usually use a thread-safe queue, so when a network stream reads a block, it places the structure in a queue where data is indicated, etc. A processing thread expects an element in the queue, and when it arrives, it pops up and processes this block.

When it finishes processing a block, it usually displays the result in another queue for the next processing step (for example, writes to a file) and (quite possibly) places the handle of the processed block in another queue, so memory can be reused to read another input block.

At least in my experience, this type of design eliminates a large percentage of thread synchronization problems.

Edit: I'm not sure how to create a thread-safe queue, but I posted the code for easy in the previous answer.

Regarding design patterns, I saw that this is called at least β€œconveyor” and β€œproduction line” (although I'm not sure I saw the latter in a lot of literature).

+2
source

Source: https://habr.com/ru/post/1313443/


All Articles