Inserts and extractors read / write binary data and text

I tried to read iostreams and better understand them. Sometimes I find that text inserts ( << ) and extractors ( >> ) are for use in text serialization. These are a few places, but this article is a good example:

http://spec.winprog.org/streams/

Outside of the <iostream> universe, there are cases where <and → are used in a streaming manner, but are not subject to any textual agreement. For example, they write binary encoded data when using Qt QDataStream :

http://doc.qt.nokia.com/latest/qdatastream.html#details

At the language level, the <and → operators relate to your project to overload (hence what QDataStream does is clearly acceptable). My question will be whether this is considered bad practice for those who use <iostream> to use <and → operators to implement binary encodings and decoding. Is there (for example) any expectation that if you write to a file on disk, that the file should be accessible for viewing and editing using a text editor?

Should I always use other method names and base them on read() and write() ? Or are text encodings simply considered the default behavior when classes integrated with the standard iostream library can be ignored?


UPDATE The key terminological problem in this seems to be the difference between I / O, which is “formatted” and “unformatted” (as opposed to the terms “text” and “binary”). I found this question:

writing binary data (std :: string) to std :: ofstream?

He has a comment from @ TomalakGeret'kal: "I would not want to use <<for binary data anyway, since my brain reads it as" formatted output ", which is not what you are doing., This is absolutely true but I just wouldn’t confuse my brain like that. "

The accepted answer to the question says that this is normal while you are using ios::binary . It seems that this reinforces the discussion "there is nothing wrong," but I still do not see an authoritative source in this matter.

+6
source share
3 answers

In fact, the << and >> operators are bit shift operators; using them for I / O, strictly speaking, is already a misnomer. However, this misuse is about as old as the operator overload itself, and I / O is the most common one today, which is why they are widely regarded as I / O operators. I am sure that if there were no precedent for iostreams, no one would use these operators for input / output (especially with C ++ 11, which has variable templates, solving the main problem that uses these operators allowed for iostreams in much more clean way). On the other hand, from a language point of view, overloaded operator<< and operator>> may mean that you want them to mean.

So the question boils down to what would be an acceptable use of these operators. For this, I believe that two cases need to be distinguished: firstly, new overloads working on iostream classes, and secondly, new overloads working on other classes, possibly designed to work as iostreams.

Consider the first new operators on the iostream classes. Let me start by observing that the iostream classes are concerned with formatting (and the reverse process, which could be called "warping", "lexing" IMHO would not be quite right here because the extractors did not determine the type, but just try to interpret the data according to with the specified type). The classes responsible for the actual I / O of the raw data are streambufs. However, note that the correct binary is not a file in which you simply delete internal raw data. Just like a text file (in fact, all the more so), a binary file must have a clearly defined encoding of the data contained in it. Especially if the files are expected to be read on different systems. Therefore, the concept of formatted output also makes sense for binary files; just formatting is different (for example, writing a predetermined number of bytes with the most significant first for an integer value).

iostreams themselves are classes that are designed to work with text files, that is, in files whose content is interpreted as a textual representation of the data. To do this, a lot of built-in behavior is optimized and can cause problems when used in binary files. An obvious example is that by default, spaces are skipped before input attempts are made. For a binary, this would be clearly wrong behavior. In addition, using locales does not make sense for binary files (although it can be argued that there may be a "binary locale", but I do not think that the locales defined for iostreams provide a suitable interface for this). So I would say that writing binary operator<< or operator>> for iostream classes would be wrong.

Otherwise, you define a separate class for binary I / O (possibly reusing the streambuf layer to do real I / O). Since we are now talking about different classes, the above argument no longer applies. So now the question arises: should operator<< and operator>> read “text input / extract operators” or, more generally, “data input / extract formatting operators” for input / output? Standard classes use them only for text, but then there are no standard classes for inserting / extracting binary inputs / outputs, so standard use cannot distinguish between the two.

I would say that binary insert / extract is close enough to text insert / extract, that this use is justified. Note that you can also create meaningful binary I / O manipulators, for example. bigendian , littleendian and intwidth(n) to determine the format in which integers should be output.

Apart from this, these operators are also used for things that are not actually I / O (and where you don’t even think about using the streambuf layer), for example, reading or pasting into a container. In my opinion, this is already a misuse of operators, since the data is not transferred to or from another format. It is simply stored in a container.

+9
source

The iostreams abstraction in the standard is a text formatted data stream; no support for any non-text format. This is an abstraction of iostreams. There is nothing wrong with defining another stream class whose abstraction is a binary format, but doing it in iostream will most likely break the existing code, not Work.

+4
source

Overloaded operators → and <format the I / O. The rest of the input-output functions (input, receive, read, write, etc.) Perform unformatted input-output. Unformatted IO means that the IO library accepts only a buffer, a sequence of unsigned characters for its input. This buffer may contain a text message or binary content. Its responsibility is to interpret the buffer. However, a formatted IO would take into account the locale. In the case of text files, depending on the environment in which the application is running, a special character conversion may occur during the input / output process to adapt to the system text file. In many environments, such as most UNIX-based systems, it makes no sense to open the file as a text file or a binary file. Note that you can overload the operator → and <for your own types. This means that you can apply formatted IOs without locale information to your own types, although this is a bit complicated.

+3
source

All Articles