VB6: I can't understand why this code works

I apologize for this stupid question. I support the old VB6 code, and I have a function that really works, but I just can not understand why it works, or why the code does not work without it.

Basically, this function reads a UTF-8 text file and displays its contents in the DHTMLEdit component. The way this happens is that it reads the entire file into a string, and then converts it from a double byte to a multibyte string using the ANSI code page, and then converts it back to double byte.

Using all of this complex mechanism forces the component to correctly display a page that has Hebrew, Arabic, Thai, and Chinese at the same time. Without using this code, the text looks like it was converted to ASCII, showing various punctuation marks where the letters were.

I dont understand what:

  • Since the source file is UTF-8 and the lines of VB6 are UTF-16, why is this even necessary? Why does VB6 correctly read a line from a file without all these conversions?
  • If a function is converted from broadcast to multibyte using CodePage = 0 (ANSI), will it not eliminate any characters that are not supported by the current code page? At this station, I don’t even have Chinese, Thai or Arabic. Still, this is the only way I can correctly display the DHTMLEdit control.

[the code]

Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal codePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal codePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, lpUsedDefaultChar As Long) As Long Private Declare Function GetACP Lib "kernel32" () As Long ... Open filePath For Input As #lFilePtr Dim sInput as String dim sResult as string Do While Not EOF(lFilePtr) Line Input #lFilePtr, sInput sResult = sResult + sInput; Loop txtBody.DOM.Body.innerText = DecodeString(sResult, CP_UTF8); Public Function DecodeString(ByVal strSource As String, Optional FromCodePage As Long = -1) As String Dim strTemp As String If strSource = vbNullString Then Exit Function strTemp = UnicodeToAnsi(strSource, 0) DecodeString = AnsiToUnicode(strTemp, FromCodePage) End Function Public Function AnsiToUnicode(ByVal strSource As String, Optional ByVal codePage As Long = -1, Optional lFlags As Long = 0) As String Dim strBuffer As String Dim cwch As Long Dim pwz As Long Dim pwzBuffer As Long If codePage = -1 Then codePage = GetACP() pwz = StrPtr(strSource) cwch = MultiByteToWideChar(codePage, lFlags, pwz, -1, 0&, 0&) strBuffer = String$(cwch + 1, vbNullChar) pwzBuffer = StrPtr(strBuffer) cwch = MultiByteToWideChar(codePage, lFlags, pwz, -1, pwzBuffer, Len(strBuffer)) AnsiToUnicode = Left(strBuffer, cwch - 1) End Function Public Function UnicodeToAnsi(ByVal strSource As String, Optional ByVal codePage As Long = -1, Optional lFlags As Long = 0) As String Dim strBuffer As String Dim cwch As Long Dim pwz As Long Dim pwzBuffer As Long If codePage = -1 Then codePage = GetACP() pwz = StrPtr(strSource) cwch = WideCharToMultiByte(codePage, lFlags, pwz, -1, 0&, 0&, ByVal 0&, ByVal 0&) strBuffer = String$(cwch + 1, vbNullChar) pwzBuffer = StrPtr(strBuffer) cwch = WideCharToMultiByte(codePage, lFlags, pwz, -1, pwzBuffer, Len(strBuffer), ByVal 0&, ByVal 0&) UnicodeToAnsi = Left(strBuffer, cwch - 1) End Function 

[the code]

+7
vb6
source share
1 answer

VB6 / A uses the implicit two-way translation of UTF16-ASCII when reading / writing files using the built-in operators.

Line Input treats the file as being in ASCII (a series of bytes, each representing a character), using the current system code page for programs other than Unicode. Readable characters are converted to UTF-16.

When you read the UTF-8 file in this way, what you get is an “invalid” line - you cannot use it directly in this language (if you try to see the garbage), but it contains useful binary data.

Then, the pointer to this used binary data is passed to WideCharToMultiByte (in UnicodeToAnsi ), which leads to the creation of another "invalid" string - this time it contains "ASCII" data. This effectively returns the VB conversion automatically using Line Input , and since the source file was in UTF-8, you now have an “invalid” line with UTF-8 data, although the conversion function thought it was converting to ASCII.

A pointer to this second invalid string is passed to MultiByteToWideChar (in AnsiToUnicode ), which finally creates a valid string that can be used in VB.

The confusing part of this code is that string used to store "invalid" data. Logically, all this should have been byte arrays. I would reorganize the code to read bytes from a file in binary mode and pass the array to MultiByteToWideChar directly.

+6
source share

All Articles