MSHTML PasteHTML () produces & nbsp;

We use the standard TWebbrowser component in Delphi, which internally uses mshtml.dll. In addition, we use the registry to ensure that pages are rendered using the new rendering engine ( Web-Browser-Control-Specifying-the-IE-Version , MSDN: FEATURE_BROWSER_EMULATION ). So we are using IE 10 rendering, but we have the same results as 8, i.e. eleven.

Using the standard MSHTML rendering engine (IE7) works correctly, but because of the new rendering options, we need a new MSHTML rendering.

We use the design mode of the control so that the user can make changes to the documents:

var
  mDocument: IHTMLDocument2;
begin
  mDocument := ((ASender as TWebBrowser).Document as IHTMLDocument2);
  mDocument.designMode := 'on';

Now we have the following problem: When we use IHTMLTxtRange.pasteHTML (...) to insert HTML code, some of them are replaced with & nbsp;

procedure TForm1.BT_PasteHtmlClick(Sender: TObject);
var
  mDoc2: IHTMLDocument2;
  mOvSel:IHTMLSelectionObject;
  mRange: IHTMLTxtRange;
  mHtml: string;
begin
  /// Reproduzierbarer Fehler bei PasteHtml
  ///  Leere Zellen und falsche Umbrüche.
  mDoc2 := WB_Test.Document as IHTMLDocument2;

  mOvSel := mDoc2.selection as IHTMLSelectionObject;
  mRange := mOvSel.CreateRange() as IHTMLTxtRange;

  mHtml := '<TABLE width="100%" border="1" cellspacing="0" cellpadding="0">  <TBODY>  <TR>    <TD>Falsche Zellen werden erstellt, wo nur diese eine sein sollte!</TD></TR></TBODY></TABLE>' + sLineBreak +
           '<p>Falsche Umbrueche '  + sLineBreak + 
           'wo keine sein sollten  durch CRLF im Html-Code!</p>' + sLineBreak;
  mRange.pasteHTML(mHtml);
end;

Looking at the inserted code, the spaces between the TABLE, TBODY, TR, and TD tags were converted to & nbsp ;. Incorrectly inserted HTML code:

<TABLE width="100%" border="1" cellspacing="0" cellpadding="0">&nbsp; 
  <TBODY>&nbsp; 
  <TR>&nbsp;&nbsp;&nbsp; 
    <TD>Falsche Zellen werden erstellt, wo nur diese eine sein 
  sollte!</TD></TR></TBODY></TABLE><BR>
<P>Falsche Umbrueche <BR>wo keine sein sollten&nbsp; durch CRLF im 
Html-Code!</P>

EDIT: Let's start with the following HTML:

<html>
  <body>
  </body>
</html>

and get after insertion:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv="Content-Type" content="text/html; charset=windows-1252">
<META name="GENERATOR" content="MSHTML 10.00.9200.16540"></HEAD>
<BODY> 
<TABLE border="1" cellspacing="0" cellpadding="0">
  <TBODY>
  <TR>
    <TD>Tabelle mit<BR>einem Text!</TD></TR></TBODY></TABLE><BR>
<P>Falsche Umbrüche durch zu viele&nbsp; Leerzeichen</P></BODY></HTML>
+4
source share
1 answer

It may be by design. According to the HTML specifications, any spaces in the HTML should be treated as a single instance of spaces (except for tags <pre>). To provide additional word separation when entering two or more spaces in design mode, IE inserts &nbsp;HTML objects instead .

+2
source

All Articles