Historically, the <object> intended to be used as a means of embedding media files, such as video and audio, in an HTML document. But as web video evolved, it turned out that you cannot provide a reasonable user interface without integrating video controls into your web application, and the de facto standard for embedding video in HTML was to embed a flash player (using <embed> or <object> ) and access the video from this flash presentation. (In HTML5, you have a <video> object for this purpose, but I think you do not have such a control for HTML files that you need to process).
Typically, when you see the <object> element used to play a video, the object reference is a SWF flash presentation -, which runs its own code that refers to a video file, But a flash presentation may or may not contain video, as well as much more. Therefore, if you want to detect video in <object> s, your options
- Have a list of all SWF files / URLs that are actually video players. This method is easiest, but keep in mind that you will have many false negatives.
- Programmatically evaluate the HTML that you parse in an isolated browser and identify the video from the screen capture. This is probably a huge effort, but it will solve your problem perfectly.
- Download and decompile SWF files referenced by
object tags and perform a heuristic to find out if they contain embedded video. I say heuristic because SWF is basically a program, and if you can define a deterministic method to find out if a program is playing video, you can try to find out if the program is crashing .
source share