Based on the comments and answers, there seem to be three approaches:
- Write a custom version of
getline() , possibly using the std::istream::getline() element inside to get the actual characters. - Use a filtering stream buffer to limit the amount of potentially received data.
- Instead of reading
std::string use string instance creation using a special allocator that limits the amount of memory stored in the string.
Not all offers came with a code. This answer provides code for all approaches and a bit of discussion on all three approaches. Before going into details of the implementation, it is first worth noting that there are several options for what should happen if the input is too long:
- Reading a line with overlapping can lead to a successful reading of a partial line, i.e. the resulting line contains the read content, and no error flags are set in the stream. This means, however, that it is not possible to distinguish a line that falls exactly within the limit or is too long. Since the limit is somewhat arbitrary, in any case, this probably does not really matter.
- Reading a line with overlapping can be considered a failure (i.e. setting
std::ios_base::failbit and / or std::ios_base::bad_bit ), and since the reading failed, enter an empty line. Obviously, the output of an empty line prevents the potential viewing of a line read so far to see what is happening. - Reading a line with overlapping can provide partial reading of a line, as well as setting error flags in a stream. This is apparently reasonable behavior, revealing that there is something, as well as providing input for potential control.
Although there are a few code examples that implement the limited version of getline() , here is another one! I think this is simpler (although perhaps slower, you can handle performance if necessary), which also preserves the std::getline() interface: it uses the width() stream to pass the constraint (perhaps taking into account the width() reasonable extension for std::getline() ):
template <typename cT, typename Traits, typename Alloc> std::basic_istream<cT, Traits>& safe_getline(std::basic_istream<cT, Traits>& in, std::basic_string<cT, Traits, Alloc>& value, cT delim) { typedef std::basic_string<cT, Traits, Alloc> string_type; typedef typename string_type::size_type size_type; typename std::basic_istream<cT, Traits>::sentry cerberos(in); if (cerberos) { value.clear(); size_type width(in.width(0)); if (width == 0) { width = std::numeric_limits<size_type>::max(); } std::istreambuf_iterator<char> it(in), end; for (; value.size() != width && it != end; ++it) { if (!Traits::eq(delim, *it)) { value.push_back(*it); } else { ++it; break; } } if (value.size() == width) { in.setstate(std::ios_base::failbit); } } return in; }
This version of getline() used in the same way as std::getline() , but when it seems reasonable to limit the amount of data read, width() , for example:
std::string line; if (safe_getline(in >> std::setw(max_characters), line)) {
Another approach is to simply use the filtering stream buffer to limit the amount of input: the filter will simply count the number of characters processed and limit the number to a suitable number of characters. This approach is actually easier to apply to the entire stream than to a single line: when processing only one line, a filter cannot simply receive buffers filled with characters from the base stream, because there is no reliable way to return characters. Implementing the unbuffered version is still simple, but probably not particularly efficient:
template <typename cT, typename Traits = std::char_traits<char> > class basic_limitbuf : std::basic_streambuf <cT, Traits> { public: typedef Traits traits_type; typedef typename Traits::int_type int_type; private: std::streamsize size; std::streamsize max; std::basic_istream<cT, Traits>* stream; std::basic_streambuf<cT, Traits>* sbuf; int_type underflow() { if (this->size < this->max) { return this->sbuf->sgetc(); } else { this->stream->setstate(std::ios_base::failbit); return traits_type::eof(); } } int_type uflow() { if (this->size < this->max) { ++this->size; return this->sbuf->sbumpc(); } else { this->stream->setstate(std::ios_base::failbit); return traits_type::eof(); } } public: basic_limitbuf(std::streamsize max, std::basic_istream<cT, Traits>& stream) : size() , max(max) , stream(&stream) , sbuf(this->stream->rdbuf(this)) { } ~basic_limitbuf() { std::ios_base::iostate state = this->stream->rdstate(); this->stream->rdbuf(this->sbuf); this->stream->setstate(state); } };
This stream buffer is already configured to insert itself during construction and delete itself upon destruction. That is, it can be used just like that:
std::string line; basic_limitbuf<char> sbuf(max_characters, in); if (std::getline(in, line)) {
It would be easy to add a limit manipulator. One of the advantages of this approach is that none of the reading code should be affected if the total size of the stream can be limited: the filter can be configured immediately after creating the stream. When there is no need to discard the filter, the filter can also use a buffer that will significantly improve performance.
The third approach is to use std::basic_string with a custom allocator. There are two aspects that are a bit inconvenient regarding the dispenser approach:
- In the line on which the reading is performed, there really is a type that is not immediately converted to
std::string (although it is also not difficult to do the conversion). - The maximum size of the array can be easily limited, but the string will have a more or less random size smaller than this: when the thread fails to throw an exception, it will be selected, and there will be no attempt to enlarge the string with a smaller size.
Here is the required code for the dispenser that limits the allocated size:
template <typename T> struct limit_alloc { private: std::size_t max_; public: typedef T value_type; limit_alloc(std::size_t max): max_(max) {} template <typename S> limit_alloc(limit_alloc<S> const& other): max_(other.max()) {} std::size_t max() const { return this->max_; } T* allocate(std::size_t size) { return size <= max_ ? static_cast<T*>(operator new[](size)) : throw std::bad_alloc(); } void deallocate(void* ptr, std::size_t) { return operator delete[](ptr); } }; template <typename T0, typename T1> bool operator== (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) { return a0.max() == a1.max(); } template <typename T0, typename T1> bool operator!= (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) { return !(a0 == a1); }
The distributor will be used something like this (the code compiles OK with the latest version of clang , but not with gcc ):
std::basic_string<char, std::char_traits<char>, limit_alloc<char> > tmp(limit_alloc<char>(max_chars)); if (std::getline(in, tmp)) { std::string(tmp.begin(), tmp.end());
Thus, there is a multiple approach, each of which has its own small flaw, but each of them is reasonably viable for the stated purpose of limiting denial of service attacks based on interlaced lines:
- Using the custom version of
getline() means that the read code needs to be changed. - Using a custom stream buffer is slower, unless the entire stream size can be limited.
- Using a custom allocator gives less control and requires some changes to read the code.