How to safely read a string from std :: istream?

I want to safely read a string from std::istream . The stream can be any one, for example, a connection on a web server or with some processing file provided by unknown sources. There are many answers starting to fulfill the moral equivalent of this code:

 void read(std::istream& in) { std::string line; if (std::getline(in, line)) { // process the line } } 

Given the possibly dubious source in , using the above code will lead to a vulnerability: a malicious agent can establish a denial of service attack using this code using a huge line. So, I would like to limit the string length to some pretty high value, say 4 million char s. Although several large lines may occur, it is not practical to allocate a buffer for each file and use std::istream::getline() .

How can I limit the maximum line size, ideally, without distorting the code too badly and without allocating large blocks of memory in front?

+50
c ++
Dec 17 '13 at 21:49
source share
4 answers

You can write your own version of std::getline with the maximum number of character read options, something called getline_n or something like that.

 #include <string> #include <iostream> template<typename CharT, typename Traits, typename Alloc> auto getline_n(std::basic_istream<CharT, Traits>& in, std::basic_string<CharT, Traits, Alloc>& str, std::streamsize n) -> decltype(in) { std::ios_base::iostate state = std::ios_base::goodbit; bool extracted = false; const typename std::basic_istream<CharT, Traits>::sentry s(in, true); if(s) { try { str.erase(); typename Traits::int_type ch = in.rdbuf()->sgetc(); for(; ; ch = in.rdbuf()->snextc()) { if(Traits::eq_int_type(ch, Traits::eof())) { // eof spotted, quit state |= std::ios_base::eofbit; break; } else if(str.size() == n) { // maximum number of characters met, quit extracted = true; in.rdbuf()->sbumpc(); break; } else if(str.max_size() <= str.size()) { // string too big state |= std::ios_base::failbit; break; } else { // character valid str += Traits::to_char_type(ch); extracted = true; } } } catch(...) { in.setstate(std::ios_base::badbit); } } if(!extracted) { state |= std::ios_base::failbit; } in.setstate(state); return in; } int main() { std::string s; getline_n(std::cin, s, 10); // maximum of 10 characters std::cout << s << '\n'; } 

Maybe too much.

+36
Dec 17 '13 at 22:13
source share

There is already such a getline function as an istream member istream , you just need to wrap it to control the buffer.

 #include <assert.h> #include <istream> #include <stddef.h> // ptrdiff_t #include <string> // std::string, std::char_traits typedef ptrdiff_t Size; namespace my { using std::istream; using std::string; using std::char_traits; istream& getline( istream& stream, string& s, Size const buf_size, char const delimiter = '\n' ) { s.resize( buf_size ); assert( s.size() > 1 ); stream.getline( &s[0], buf_size, delimiter ); if( !stream.fail() ) { Size const n = char_traits<char>::length( &s[0] ); s.resize( n ); // Downsizing. } return stream; } } // namespace my 
+16
Dec 17 '13 at 22:56
source share

Replace std :: getline by wrapping around std :: IStream :: GetLine :

 std::istream& my::getline( std::istream& is, std::streamsize n, std::string& str, char delim ) { try { str.resize(n); is.getline(&str[0],n,delim); str.resize(is.gcount()); return is; } catch(...) { str.resize(0); throw; } } 

If you want to avoid excessive time allocations of memory, you can use a loop that increases the allocation as needed (perhaps doubles the size on each pass). Remember that exceptions may or may not be included in the istream object.

Here is the version with a more efficient distribution strategy:

 std::istream& my::getline( std::istream& is, std::streamsize n, std::string& str, char delim ) { std::streamsize base=0; do { try { is.clear(); std::streamsize chunk=std::min(n-base,std::max(static_cast<std::streamsize>(2),base)); if ( chunk == 0 ) break; str.resize(base+chunk); is.getline(&str[base],chunk,delim); } catch( std::ios_base::failure ) { if ( !is.gcount () ) str.resize(0), throw; } base += is.gcount(); } while ( is.fail() && is.gcount() ); str.resize(base); return is; } 
+8
Dec 17 '13 at 22:52
source share

Based on the comments and answers, there seem to be three approaches:

  • Write a custom version of getline() , possibly using the std::istream::getline() element inside to get the actual characters.
  • Use a filtering stream buffer to limit the amount of potentially received data.
  • Instead of reading std::string use string instance creation using a special allocator that limits the amount of memory stored in the string.

Not all offers came with a code. This answer provides code for all approaches and a bit of discussion on all three approaches. Before going into details of the implementation, it is first worth noting that there are several options for what should happen if the input is too long:

  • Reading a line with overlapping can lead to a successful reading of a partial line, i.e. the resulting line contains the read content, and no error flags are set in the stream. This means, however, that it is not possible to distinguish a line that falls exactly within the limit or is too long. Since the limit is somewhat arbitrary, in any case, this probably does not really matter.
  • Reading a line with overlapping can be considered a failure (i.e. setting std::ios_base::failbit and / or std::ios_base::bad_bit ), and since the reading failed, enter an empty line. Obviously, the output of an empty line prevents the potential viewing of a line read so far to see what is happening.
  • Reading a line with overlapping can provide partial reading of a line, as well as setting error flags in a stream. This is apparently reasonable behavior, revealing that there is something, as well as providing input for potential control.

Although there are a few code examples that implement the limited version of getline() , here is another one! I think this is simpler (although perhaps slower, you can handle performance if necessary), which also preserves the std::getline() interface: it uses the width() stream to pass the constraint (perhaps taking into account the width() reasonable extension for std::getline() ):

 template <typename cT, typename Traits, typename Alloc> std::basic_istream<cT, Traits>& safe_getline(std::basic_istream<cT, Traits>& in, std::basic_string<cT, Traits, Alloc>& value, cT delim) { typedef std::basic_string<cT, Traits, Alloc> string_type; typedef typename string_type::size_type size_type; typename std::basic_istream<cT, Traits>::sentry cerberos(in); if (cerberos) { value.clear(); size_type width(in.width(0)); if (width == 0) { width = std::numeric_limits<size_type>::max(); } std::istreambuf_iterator<char> it(in), end; for (; value.size() != width && it != end; ++it) { if (!Traits::eq(delim, *it)) { value.push_back(*it); } else { ++it; break; } } if (value.size() == width) { in.setstate(std::ios_base::failbit); } } return in; } 

This version of getline() used in the same way as std::getline() , but when it seems reasonable to limit the amount of data read, width() , for example:

 std::string line; if (safe_getline(in >> std::setw(max_characters), line)) { // do something with the input } 

Another approach is to simply use the filtering stream buffer to limit the amount of input: the filter will simply count the number of characters processed and limit the number to a suitable number of characters. This approach is actually easier to apply to the entire stream than to a single line: when processing only one line, a filter cannot simply receive buffers filled with characters from the base stream, because there is no reliable way to return characters. Implementing the unbuffered version is still simple, but probably not particularly efficient:

 template <typename cT, typename Traits = std::char_traits<char> > class basic_limitbuf : std::basic_streambuf <cT, Traits> { public: typedef Traits traits_type; typedef typename Traits::int_type int_type; private: std::streamsize size; std::streamsize max; std::basic_istream<cT, Traits>* stream; std::basic_streambuf<cT, Traits>* sbuf; int_type underflow() { if (this->size < this->max) { return this->sbuf->sgetc(); } else { this->stream->setstate(std::ios_base::failbit); return traits_type::eof(); } } int_type uflow() { if (this->size < this->max) { ++this->size; return this->sbuf->sbumpc(); } else { this->stream->setstate(std::ios_base::failbit); return traits_type::eof(); } } public: basic_limitbuf(std::streamsize max, std::basic_istream<cT, Traits>& stream) : size() , max(max) , stream(&stream) , sbuf(this->stream->rdbuf(this)) { } ~basic_limitbuf() { std::ios_base::iostate state = this->stream->rdstate(); this->stream->rdbuf(this->sbuf); this->stream->setstate(state); } }; 

This stream buffer is already configured to insert itself during construction and delete itself upon destruction. That is, it can be used just like that:

 std::string line; basic_limitbuf<char> sbuf(max_characters, in); if (std::getline(in, line)) { // do something with the input } 

It would be easy to add a limit manipulator. One of the advantages of this approach is that none of the reading code should be affected if the total size of the stream can be limited: the filter can be configured immediately after creating the stream. When there is no need to discard the filter, the filter can also use a buffer that will significantly improve performance.

The third approach is to use std::basic_string with a custom allocator. There are two aspects that are a bit inconvenient regarding the dispenser approach:

  • In the line on which the reading is performed, there really is a type that is not immediately converted to std::string (although it is also not difficult to do the conversion).
  • The maximum size of the array can be easily limited, but the string will have a more or less random size smaller than this: when the thread fails to throw an exception, it will be selected, and there will be no attempt to enlarge the string with a smaller size.

Here is the required code for the dispenser that limits the allocated size:

 template <typename T> struct limit_alloc { private: std::size_t max_; public: typedef T value_type; limit_alloc(std::size_t max): max_(max) {} template <typename S> limit_alloc(limit_alloc<S> const& other): max_(other.max()) {} std::size_t max() const { return this->max_; } T* allocate(std::size_t size) { return size <= max_ ? static_cast<T*>(operator new[](size)) : throw std::bad_alloc(); } void deallocate(void* ptr, std::size_t) { return operator delete[](ptr); } }; template <typename T0, typename T1> bool operator== (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) { return a0.max() == a1.max(); } template <typename T0, typename T1> bool operator!= (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) { return !(a0 == a1); } 

The distributor will be used something like this (the code compiles OK with the latest version of clang , but not with gcc ):

 std::basic_string<char, std::char_traits<char>, limit_alloc<char> > tmp(limit_alloc<char>(max_chars)); if (std::getline(in, tmp)) { std::string(tmp.begin(), tmp.end()); // do something with the input } 

Thus, there is a multiple approach, each of which has its own small flaw, but each of them is reasonably viable for the stated purpose of limiting denial of service attacks based on interlaced lines:

  • Using the custom version of getline() means that the read code needs to be changed.
  • Using a custom stream buffer is slower, unless the entire stream size can be limited.
  • Using a custom allocator gives less control and requires some changes to read the code.
+5
Dec 19 '13 at 18:05
source share



All Articles