Avoiding a copy of the buffer is easy to do with a custom stream buffer, which simply sets the receive area to use the buffer. The stream buffer doesn’t even need to redefine any of the virtual functions and just configure the internal buffer:
class imemstream : private virtual std::streambuf , public std::istream { public: imemstream(char* begin, char* end) : std::streambuf() , std::istream(static_cast<std::streambuf*>(this)) { this->setg(begin, begin, end); } }; std::vector<int> parse_data_via_istream(char* begin, char* end) { imemstream in(begin, end); return std::vector<int>(std::istream_iterator<int>(in), std::istream_iterator<int>()); }
This approach avoids copying the stream and uses the predefined functions std::istream . However, it creates a stream object. Using the appropriate update function, a stream / stream buffer can be expanded to a reset buffer and process multiple buffers.
To avoid creating a stream, you can use the basic functions from std::num_get<...> . Actual parsing is performed using one of the std::locale facets. Digital parsing for std::istream is done using std::num_get<char, std::istreambuf_iterator<char>> . This face does not help much, since it uses the sequence specified by std::istreambuf_iterator<char> , but you can create an instance of std::num_get<char, char const*> . It will not be part of the standard std::locale , but it is easy to create the corresponding std::locale and set it, for example, as a global std::locale object, primarily in main() :
int main() { std::locale::global(std::locale(std::locale(), new std::num_get<char, char const*>())); ...
Note that the std::locale object will clear the added face, i.e. there is no need to add a cleanup code: graphs are counted and released when the last std::locale holding a certain face disappears. To actually use a face, this, unfortunately, needs a std::ios_base object that can really be obtained only from some object flow. However, any stream can be used (although in a multi-threaded system, it should probably be a separate stream object in the stream to avoid random race conditions):
char const* skipspace(char const* it, char const* end) { return std::find_if(it, end, [](unsigned char c){ return !std::isspace(c); }); } std::vector<int> parse_data_via_istream(std::ios_base& fmt, char const* it, char const* end) { std::vector<int> rc; std::num_get<char, char const*> const& ng = std::use_facet<std::num_get<char, char const*>>(std::locale()); std::ios_base::iostate error; for (long tmp; (it = ng.get(skipspace(it, end), end, fmt, error, tmp)) , error == std::ios_base::goodbit; ) { rc.push_back(tmp); } return rc; }
Most of this just manages the error a bit and skips leading spaces: basically std::istream provides tools for automatically skipping spaces for formatted input and handling the necessary error protocol. There is a potentially small advantage to the approach described above with respect to receiving a face only once per buffer and avoiding the creation of an std::istream::sentry , as well as preventing the creation of a stream. Of course, the code assumes that some stream can be used to pass it as its subcategory std::ios_base& to provide parsing flags, such as the base to be used.
OK, this is quite a bit of code for something, which basically could be strtol() . The approach using std::num_get<char, char const*> has some flexibility that strtol() not offer:
- Since the
std::locale facet is used, which can be overridden to analyze arbitrary presentation formats, such as Roman numerals, it is more flexible with respect to input formats. - It’s easy to customize the use of thousands separators or change the decimal point (just change
std::numpunct<char> in the std::locale used by fmt to set them). - The buffer should not be completed with a zero mark. For example, a continuous 8-digit character sequence can be analyzed by supplying
it and it+8 as a range when calling std::num_get<char, char const*>::get() .
However, strtol() is probably a good approach for most applications. On the other hand, the above provides an alternative that may be useful in some contexts.