Break UTF-8 encoded string into empty characters, not knowing about UTF-8 encoding

Question

Break UTF-8 encoded string into empty characters, not knowing about UTF-8 encoding

I would like to split the string into each empty character ('', '\ n', '\ r', '\ t', '\ v', '\ f') The string is stored in UTF8 encoding in a byte array (char *, e.g. vector or string)

Is it possible to simply split an array of bytes into each split character? It is said differently, I'm sure that the byte values corresponding to these characters cannot be found in a multibyte character? Looking at the UTF-8 specification, it seems that all multibyte characters have only bytes above 128.

thank

+4

c ++ string split encoding utf-8

galinette Oct 9 '14 at 13:01

source share

2 answers

, , "".

, , UTF-8...

+2

Nemanja Trifunovic 09 . '14 13:19

Paulo1205 · Accepted Answer · 2014-10-09T13:31:04+0000

Yes, you can.

( MSB 11) ( MSB 10). ( + ) MSB, 1 , 0 (: 110xxxxx, , 11110xxx, ).

, MB , , , , , , , , , .

-, , , : Unicode "" , ASCII. , .

Break UTF-8 encoded string into empty characters, not knowing about UTF-8 encoding

More articles: