Discover external content using TEmbeddedWB or TWebBrowser

I am trying to block something external loaded by TEmbeddedWB or TWebBrowser (or TCppWebBrowser). I would like to block everything that is downloaded from the Internet, including images, javascript, external CSS, external [embed] or [object] or [applet] or [frame] or [iframe] that execute JavaScript that can load external content, etc. .d.

This problem has two parts:

  • including a web browser in “restrict everything” (except basic HTML without images) and determine if such content exists
  • If there is no external content, if there is one, a “download panel” is displayed, which after clicking puts the web browser in the “download all” mode and receives all the content.

The first element has problems. In TEmbeddedWB, you can block almost everything using DownloadOptions, and the most important is the ForceOffline switch, but even when turned off it still passes through tags such as [object] or [iframe] tags. I know that this is so because I implemented the OnBeforeNavigate2 event and fires the URLs contained in these tags, and also makes an entry in the local server log. Setting OfflineMode and ForceOfflineMode in TEmbeddedWB does not help for these elements.

So how can I block everything? Therefore, it should start as the main HTML with blocked external elements, including scripts and CSS. Is there a way to trigger an event every time he wants to download something so that it can be blocked or to avoid triggering such an event in the first place, blocking all external downloads? Do I need to bother with the zones and security of Internet Explorer? Any pointer in the right direction would be helpful.

The second element is also complex, because I need to determine the presence of problematic tags (for example, "applet", "script", "link", etc. This detection does not have to be perfect, but it must be at least good enough to to cover most of these tags, I did it like this:

 //---------------------------------------------------------------------- // Check for external content (images, scripts, ActiveX, frames...) //---------------------------------------------------------------------- try { bool HasExternalContent = false; DelphiInterface<IHTMLDocument2> diDoc; // Smart pointer wrapper - should automatically call release() and do reference counting diDoc = TEmbeddedWB->Document; DelphiInterface<IHTMLElementCollection> diColApplets; DelphiInterface<IDispatch> diDispApplets; DelphiInterface<IHTMLObjectElement> diObj; DelphiInterface<IHTMLElementCollection> diColEmbeds; DelphiInterface<IDispatch> diDispEmbeds; DelphiInterface<IHTMLFramesCollection2> diColFrames; DelphiInterface<IDispatch> diDispFrames; DelphiInterface<IHTMLElementCollection> diColImages; DelphiInterface<IDispatch> diDispImages; DelphiInterface<IHTMLImgElement> diImg; DelphiInterface<IHTMLElementCollection> diColLinks; DelphiInterface<IDispatch> diDispLinks; DelphiInterface<IHTMLElementCollection> diColPlugins; DelphiInterface<IDispatch> diDispPlugins; DelphiInterface<IHTMLElementCollection> diColScripts; DelphiInterface<IDispatch> diDispScripts; DelphiInterface<IHTMLStyleSheetsCollection> diColStyleSheets; DelphiInterface<IDispatch> diDispStyleSheets; OleCheck(diDoc->Get_applets (diColApplets)); OleCheck(diDoc->Get_embeds (diColEmbeds)); OleCheck(diDoc->Get_frames (diColFrames)); OleCheck(diDoc->Get_images (diColImages)); OleCheck(diDoc->Get_links (diColLinks)); OleCheck(diDoc->Get_plugins (diColPlugins)); OleCheck(diDoc->Get_scripts (diColScripts)); OleCheck(diDoc->Get_styleSheets (diColStyleSheets)); // Scan for applets external links for (int i = 0; i < diColApplets->length; i++) { OleCheck(diColApplets->item(i,i,diDispApplets)); if (diDispApplets != NULL) { diDispApplets->QueryInterface(IID_IHTMLObjectElement, (void**)&diObj); if (diObj != NULL) { UnicodeString s1 = Sysutils::Trim(diObj->data), s2 = Sysutils::Trim(diObj->codeBase), s3 = Sysutils::Trim(diObj->classid); if (StartsText("http", s1) || StartsText("http", s2) || StartsText("http", s3)) { HasExternalContent = true; break; // At least 1 found, bar will be shown, no further search needed } } } } // Scan for images external links for (int i = 0; i < diColImages->length; i++) { OleCheck(diColImages->item(i,i,diDispImages)); if (diDispImages != NULL) // Unnecessary? OleCheck throws exception if this applies? { diDispImages->QueryInterface(IID_IHTMLImgElement, (void**)&diImg); if (diImg != NULL) { UnicodeString s1 = Sysutils::Trim(diImg->src); // Case insensitive check if (StartsText("http", s1)) { HasExternalContent = true; break; // At least 1 found, bar will be shown, no further search needed } } } } } catch (Exception &e) { // triggered by OleCheck ShowMessage(e.Message); } 

Is there an easier way to scan this or the only one to start multiple loops using other interface functions like Get_applets , Get_embeds , Get_stylesheets , etc., similar to the code above? So far, I have found that I will have to call the following functions to cover all of this:

  OleCheck(diDoc->Get_applets (diColApplets)); OleCheck(diDoc->Get_embeds (diColEmbeds)); OleCheck(diDoc->Get_frames (diColFrames)); OleCheck(diDoc->Get_images (diColImages)); OleCheck(diDoc->Get_links (diColLinks)); OleCheck(diDoc->Get_plugins (diColPlugins)); OleCheck(diDoc->Get_scripts (diColScripts)); OleCheck(diDoc->Get_styleSheets (diColStyleSheets)); 

But I would prefer not to implement this many cycles if this could be facilitated. Could it be?

+4
source share
1 answer

I offer you this solution:

 #include "html.h" THTMLDocument doc; void __fastcall TForm1::CppWebBrowser1DocumentComplete(TObject *Sender, LPDISPATCH pDisp, Variant *URL) { doc.documentFromVariant(CppWebBrowser1->Document); bool HasExternalContent = false; for (int i=0; i<doc.images.length; i++) { if(doc.images[i].src.SubString(1, 4) == "http") { HasExternalContent = true; break; } } for (int i=0; i<doc.applets.length; i++) { THTMLObjectElement obj = doc.applets[i]; if(obj.data.SubString(1, 4) == "http") HasExternalContent = true; if(obj.codeBase.SubString(1, 4) == "http") HasExternalContent = true; if(obj.classid.SubString(1, 4) == "http") HasExternalContent = true; } } 

These greate shell classes can be downloaded from here .

+2
source

Source: https://habr.com/ru/post/1412996/


All Articles