Detect and analyze embedded video in html?

I am working on a project that requires me to detect and retrieve the video embed code on a web page.

I know that the <object> used to embed a video, but the specification says that it can also be used for other things, such as images.

So, how do I deterministically know that the <object> contains a video inside? or is there any other way to find out?

+4
source share
1 answer

Historically, the <object> intended to be used as a means of embedding media files, such as video and audio, in an HTML document. But as web video evolved, it turned out that you cannot provide a reasonable user interface without integrating video controls into your web application, and the de facto standard for embedding video in HTML was to embed a flash player (using <embed> or <object> ) and access the video from this flash presentation. (In HTML5, you have a <video> object for this purpose, but I think you do not have such a control for HTML files that you need to process).

Typically, when you see the <object> element used to play a video, the object reference is a SWF flash presentation -, which runs its own code that refers to a video file, But a flash presentation may or may not contain video, as well as much more. Therefore, if you want to detect video in <object> s, your options

  • Have a list of all SWF files / URLs that are actually video players. This method is easiest, but keep in mind that you will have many false negatives.
  • Programmatically evaluate the HTML that you parse in an isolated browser and identify the video from the screen capture. This is probably a huge effort, but it will solve your problem perfectly.
  • Download and decompile SWF files referenced by object tags and perform a heuristic to find out if they contain embedded video. I say heuristic because SWF is basically a program, and if you can define a deterministic method to find out if a program is playing video, you can try to find out if the program is crashing .
+1
source

Source: https://habr.com/ru/post/1314303/


All Articles