Based on this question: How to get HTML source code from TWebBrowser
If I run this code with an html page with a Unicode code page, the result will be gibberish because TStringStream is not Unicode in D7. the page can be encoded in UTF8 or another (Ansi) code page.
How to determine if TStream / IPersistStreamInit is Unicode / UTF8 / Ansi?
How to always return the correct result as a WideString for this function?
function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;
If I replaced TStringStream with TMemoryStream and saved TMemoryStream so that all is well. It can be either Unicode / UTF8 / Ansi. but I always want to return the stream back as WideString:
function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString; var // LStream: TStringStream; LStream: TMemoryStream; Stream : IStream; LPersistStreamInit : IPersistStreamInit; begin if not Assigned(WebBrowser.Document) then exit; // LStream := TStringStream.Create(''); LStream := TMemoryStream.Create; try LPersistStreamInit := WebBrowser.Document as IPersistStreamInit; Stream := TStreamAdapter.Create(LStream,soReference); LPersistStreamInit.Save(Stream,true); // result := LStream.DataString; LStream.SaveToFile('c:\test\test.txt'); // test only - file is ok Result := ??? // WideString finally LStream.Free(); end; end;
EDIT: I found this article - How to load and save documents in Delphi-style TWebBrowser
Which does what I need. but it only works correctly with Delphi Unicode compilers (D2009 +). read the Conclusion section:
Obviously, we can do a lot. A few things immediately spring to mind. We modify some of the Unicode functionality and support for non-ANSI encodings to pre-Unicode compiler code. Real code when compiling with anything earlier than Delphi 2009 will not correctly save the contents of the document in lines if the document character set is not ANSI.
The magic is obviously in the TEncoding class ( TEncoding.GetBufferEncoding ). but D7 does not have TEncoding . Any ideas?
source share