How to write std :: string to a UTF-8 text file

Question

How to write std :: string to a UTF-8 text file

I just want to write some simple lines to a text file in C ++, but I want them to be encoded in UTF-8. What is the easiest and easiest way to do this?

+53

c ++ utf-8

poiloi Jun 10 '10 at 1:29

source share

9 answers

Ben Voigt · Answer 1 · 2010-06-10 01:57

The only way UTF-8 affects std::string is that size() , length() , and all indices are measured in bytes, not characters.

And, as sbi points out, incrementing the iterator provided by std::string will be a step-by-step byte, not a character, so it can actually point to the middle of a multibyte UTF-8 code point. There is no iterator with UTF-8 support provided in the standard library, but there are several available on the Web.

If you remember this, you can put UTF-8 in std::string , write it to a file, etc. all in the usual way (by which I mean the way you use std::string without UTF- 8 inside).

You might want to run your file with a byte so that other programs know that it is UTF-8.

denys · Answer 2 · 2013-03-26 19:03

There is a nice tiny library for working with utf8 from C ++: utfcpp

Brian R. Bondy · Answer 3 · 2010-06-10 01:31

libiconv is a great library for all our coding and decoding needs.

If you are using Windows, you can use WideCharToMultiByte and indicate that you want UTF8.

Jakob Riedle · Answer 4 · 2015-12-02 11:09

What is the easiest and easiest way to do this?

The most intuitive and therefore the simplest utf8 handling in C ++ will probably use the plugin replacement for std::string . Since it is not yet on the Internet, I decided to implement this function myself:

tinyutf8 (EDIT: now Github).

This library provides a very easy insertion for std::string (or std::u32string if you want, because you are std::u32string over code points, not characters). Ity has been successfully implemented in the middle between fast access and low memory consumption, while being very reliable. This resistance to "invalid" UTF8 sequences makes it (almost completely) compatible with ANSI (0-255).

Hope this helps!

Tony the Pony · Answer 5 · 2010-06-10 01:34

If by “simple” you mean ASCII, there is no need to do any encoding, since characters with an ASCII value of 127 or less are the same in UTF-8.

Serov Danil · Answer 6 · 2013-06-28 12:52

 std::wstring text = L""; QString qstr = QString::fromStdWString(text); QByteArray byteArray(qstr.toUtf8()); std::string str_std( byteArray.constData(), byteArray.length());

rmawatson · Answer 7 · 2018-12-09 02:05

I prefer to convert to and from std :: u32string and work with internal code points, and then convert to utf8 when writing to a file using these conversion iterators that I installed on github.

Artem Vorotnikov · Answer 8 · 2017-01-25 07:09

Use Glib :: ustring from glibmm .

This is the only widespread UTF-8 string container (AFAIK). Although it is based on a glyph (not byte) it has the same method signatures as std::string , so the port should be a simple search and replace (just make sure your data is UTF-8 valid before loading into ustring).

Anatoly · Answer 9 · 2013-07-27 20:13

As for UTF-8, this is a multi-bit string of characters, and therefore you have problems, and this is a bad idea / Instead of using regular Unicode.

So, in my opinion, it is best to use plain ASCII char text with some set of encodings. You must use Unicode if you are using more than two sets of different characters (languages) in one.

This is a rather rare case. In most cases, 2 character sets are sufficient. For this general case, use ASCII characters, not Unicode.

The effect of using multiple characters, such as UTF-8, you get only traditional Chinese, Arabic or hieroglyphic text. This is a very rare case !!!

I do not think that many people need this. Therefore, never use UTF-8 !!! This avoids severe headaches when manipulating such lines.

How to write std :: string to a UTF-8 text file

More articles: