Asynchronous write to file in C ++ unix

I have a long loop that I need to write some data to a file at each iteration. The problem is that writing to a file can be slow, so I would like to reduce the time it takes to write asynchronously.

Does anyone know a good way to do this? Should I create a stream that consumes everything that fits into the buffer, writing it (in this case, one producer, one consumer)?

I am mainly interested in solutions that include nothing but the standard library (C ++ 11).

+6
source share
2 answers

Before proceeding to asynchronous recording, if you use IOStreams, you can try to avoid accidentally deleting the stream, for example, not using std::endl , but instead using '\n' . Since writing to IOStreams is buffered, this can slightly improve performance.

If this is not enough, the next question is how the data is written. If there is a lot of formatting, there is a possibility that the actual formatting takes most of the time. You may be able to format the formatting into a separate stream, but it is not at all like just transferring a couple of bytes to another stream: you need to transfer a suitable data structure containing the data for formatting. What is appropriate depends on what you are actually writing.

Finally, if writing buffers to a file is actually a bottleneck, and you want to stick to the C ++ standard library, it might be wise to have a message stream that listens on a queue filled with buffers from a suitable stream buffer and writes buffers to std::ofstream : the manufacturer’s interface will be std::ostream , which will probably send buffers with a fixed size either when the buffer is full or when the stream is cleared (for which I would use std::flush explicitly) to the queue that another reader is listening on. The following is a brief implementation of this idea using only standard library tools:

 #include <condition_variable> #include <fstream> #include <mutex> #include <queue> #include <streambuf> #include <string> #include <thread> #include <vector> struct async_buf : std::streambuf { std::ofstream out; std::mutex mutex; std::condition_variable condition; std::queue<std::vector<char>> queue; std::vector<char> buffer; bool done; std::thread thread; void worker() { bool local_done(false); std::vector<char> buf; while (!local_done) { { std::unique_lock<std::mutex> guard(this->mutex); this->condition.wait(guard, [this](){ return !this->queue.empty() || this->done; }); if (!this->queue.empty()) { buf.swap(queue.front()); queue.pop(); } local_done = this->queue.empty() && this->done; } if (!buf.empty()) { out.write(buf.data(), std::streamsize(buf.size())); buf.clear(); } } out.flush(); } public: async_buf(std::string const& name) : out(name) , buffer(128) , done(false) , thread(&async_buf::worker, this) { this->setp(this->buffer.data(), this->buffer.data() + this->buffer.size() - 1); } ~async_buf() { std::unique_lock<std::mutex>(this->mutex), (this->done = true); this->condition.notify_one(); this->thread.join(); } int overflow(int c) { if (c != std::char_traits<char>::eof()) { *this->pptr() = std::char_traits<char>::to_char_type(c); this->pbump(1); } return this->sync() != -1 ? std::char_traits<char>::not_eof(c): std::char_traits<char>::eof(); } int sync() { if (this->pbase() != this->pptr()) { this->buffer.resize(std::size_t(this->pptr() - this->pbase())); { std::unique_lock<std::mutex> guard(this->mutex); this->queue.push(std::move(this->buffer)); } this->condition.notify_one(); this->buffer = std::vector<char>(128); this->setp(this->buffer.data(), this->buffer.data() + this->buffer.size() - 1); } return 0; } }; int main() { async_buf sbuf("async.out"); std::ostream astream(&sbuf); std::ifstream in("async_stream.cpp"); for (std::string line; std::getline(in, line); ) { astream << line << '\n' << std::flush; } } 
+14
source

Search the web for "double buffering."

Typically, a single stream will be written to one or more buffers. Another stream reads from buffers, “chasing” the write stream.

This may not make your program more efficient. Efficiency with files is achieved by writing in huge blocks so that the disk does not get a chance to roll back. A single record of many bytes is more efficient than many records of several bytes.

This can be achieved due to the fact that the recording stream is recorded only when the contents of the buffer have exceeded a certain threshold, such as 1k.

Also explore the topic of "buffering" or "print buffering."

You need to use C ++ 11, since previous versions do not support streams in the standard library. I don’t know why you are limiting yourself, as Boost has good things.

+1
source

All Articles