Utf8 & # 8596; utf16: low codecvt performance

Question

Utf8 & # 8596; utf16: low codecvt performance

I look at some of my old (and exclusively win32-oriented) things and think about making it more modern / portable, i.e. override some commonly used reuse elements in C ++ 11. One of these parts is converted between utf8 and utf16. In the Win32 API, I am using MultiByteToWideChar / WideCharToMultiByte , trying to port this stuff to C ++ 11, using the code example here: https://stackoverflow.com/a/360677/169 . The result is

Release build (compiled by MSVS 2013, launched on Core i7 3610QM)

 stdlib = 1587.2 ms Win32 = 127.2 ms

Assembly debugging

 stdlib = 5733.8 ms Win32 = 127.2 ms

The question is, is there something wrong with the code? If everything seems to be OK - is there a good reason for this performance difference?

Test code below:

 #include <iostream> #include <fstream> #include <string> #include <iterator> #include <clocale> #include <codecvt> #define XU_BEGIN_TIMER(NAME) \ { \ LARGE_INTEGER __freq; \ LARGE_INTEGER __t0; \ LARGE_INTEGER __t1; \ double __tms; \ const char* __tname = NAME; \ char __tbuf[0xff]; \ \ QueryPerformanceFrequency(&__freq); \ QueryPerformanceCounter(&__t0); #define XU_END_TIMER() \ QueryPerformanceCounter(&__t1); \ __tms = (__t1.QuadPart - __t0.QuadPart) * 1000.0 / __freq.QuadPart; \ sprintf_s(__tbuf, sizeof(__tbuf), " %-24s = %6.1f ms\n", __tname, __tms ); \ OutputDebugStringA(__tbuf); \ printf(__tbuf); \ } std::string read_utf8() { std::ifstream infile("C:/temp/UTF-8-demo.txt"); std::string fileData((std::istreambuf_iterator<char>(infile)), std::istreambuf_iterator<char>()); infile.close(); return fileData; } void testMethod() { std::setlocale(LC_ALL, "en_US.UTF-8"); std::string source = read_utf8(); { std::string utf8; XU_BEGIN_TIMER("stdlib") { for( int i = 0; i < 1000; i++ ) { std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf16; std::u16string utf16 = convert2utf16.from_bytes(source); std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf8; utf8 = convert2utf8.to_bytes(utf16); } } XU_END_TIMER(); FILE* output = fopen("c:\\temp\\utf8-std.dat", "wb"); fwrite(utf8.c_str(), 1, utf8.length(), output); fclose(output); } char* utf8 = NULL; int cchA = 0; { XU_BEGIN_TIMER("Win32") { for( int i = 0; i < 1000; i++ ) { WCHAR* utf16 = new WCHAR[source.length() + 1]; int cchW; utf8 = new char[source.length() + 1]; cchW = MultiByteToWideChar( CP_UTF8, 0, source.c_str(), source.length(), utf16, source.length() + 1); cchA = WideCharToMultiByte( CP_UTF8, 0, utf16, cchW, utf8, source.length() + 1, NULL, false); delete[] utf16; if( i != 999 ) delete[] utf8; } } XU_END_TIMER(); FILE* output = fopen("c:\\temp\\utf8-win.dat", "wb"); fwrite(utf8, 1, cchA, output); fclose(output); delete[] utf8; } }

+9

c ++ performance c ++ 11 utf-8

Xtra Coder 04 Oct '14 at 20:05

source share

2 answers

In my own testing, I found that calling the constructor for wstring_convert has significant overhead, at least on Windows. According to other answers, you are likely to fight for a regular Windows implementation, but try changing your code to build a converter outside the loop. I expect you to see an improvement from 5x to 20x, especially in the debug build.

+7

James Davies Aug 24 '15 at 10:42 on

source share

Cory Nelson · Accepted Answer · 2014-10-11 22:20

Win32 UTF8 transcode, since Vista uses SSE for internal use, which is very few other UTF transcoders. I suspect that it will not be possible to surpass even the most optimized portable code.

However, the number you codecvt for codecvt is extremely slow if it takes 10 times and offers a naive implementation. While I was writing my own UTF-8 decoder, I was able to achieve Win32 performance 2-3 times. There are many opportunities for improvement, but for this you will need to execute your own codecvt code.

Utf8 & # 8596; utf16: low codecvt performance

More articles: