How to write std :: string to a UTF-8 text file

I just want to write some simple lines to a text file in C ++, but I want them to be encoded in UTF-8. What is the easiest and easiest way to do this?

+53
c ++ utf-8
Jun 10 '10 at 1:29
source share
9 answers

The only way UTF-8 affects std::string is that size() , length() , and all indices are measured in bytes, not characters.

And, as sbi points out, incrementing the iterator provided by std::string will be a step-by-step byte, not a character, so it can actually point to the middle of a multibyte UTF-8 code point. There is no iterator with UTF-8 support provided in the standard library, but there are several available on the Web.

If you remember this, you can put UTF-8 in std::string , write it to a file, etc. all in the usual way (by which I mean the way you use std::string without UTF- 8 inside).

You might want to run your file with a byte so that other programs know that it is UTF-8.

+53
Jun 10 2018-10-10T00:
source share

There is a nice tiny library for working with utf8 from C ++: utfcpp

+24
Mar 26 '13 at 19:03
source share

libiconv is a great library for all our coding and decoding needs.

If you are using Windows, you can use WideCharToMultiByte and indicate that you want UTF8.

+9
Jun 10 2018-10-10T00:
source share

What is the easiest and easiest way to do this?

The most intuitive and therefore the simplest utf8 handling in C ++ will probably use the plugin replacement for std::string . Since it is not yet on the Internet, I decided to implement this function myself:

tinyutf8 (EDIT: now Github).

This library provides a very easy insertion for std::string (or std::u32string if you want, because you are std::u32string over code points, not characters). Ity has been successfully implemented in the middle between fast access and low memory consumption, while being very reliable. This resistance to "invalid" UTF8 sequences makes it (almost completely) compatible with ANSI (0-255).

Hope this helps!

+9
Dec 02 '15 at 11:09
source share

If by β€œsimple” you mean ASCII, there is no need to do any encoding, since characters with an ASCII value of 127 or less are the same in UTF-8.

+7
Jun 10 2018-10-10T00:
source share
 std::wstring text = L""; QString qstr = QString::fromStdWString(text); QByteArray byteArray(qstr.toUtf8()); std::string str_std( byteArray.constData(), byteArray.length()); 
+5
Jun 28 '13 at 12:52
source share

I prefer to convert to and from std :: u32string and work with internal code points, and then convert to utf8 when writing to a file using these conversion iterators that I installed on github.

0
Dec 09 '18 at 2:05
source share

Use Glib :: ustring from glibmm .

This is the only widespread UTF-8 string container (AFAIK). Although it is based on a glyph (not byte) it has the same method signatures as std::string , so the port should be a simple search and replace (just make sure your data is UTF-8 valid before loading into ustring).

-one
Jan 25 '17 at 7:09
source share

As for UTF-8, this is a multi-bit string of characters, and therefore you have problems, and this is a bad idea / Instead of using regular Unicode.

So, in my opinion, it is best to use plain ASCII char text with some set of encodings. You must use Unicode if you are using more than two sets of different characters (languages) in one.

This is a rather rare case. In most cases, 2 character sets are sufficient. For this general case, use ASCII characters, not Unicode.

The effect of using multiple characters, such as UTF-8, you get only traditional Chinese, Arabic or hieroglyphic text. This is a very rare case !!!

I do not think that many people need this. Therefore, never use UTF-8 !!! This avoids severe headaches when manipulating such lines.

-26
Jul 27 '13 at 20:13
source share



All Articles