The primary point of the HAR format is the presence of a standard HTTP tracing format, which many tools can use and analyze. In other words, this initial intention was, first and foremost, to analyze performance, rather than βarchivingβ web pages as such.
If you get a page with wget
, you are missing 99% of all performance data. To capture the necessary data, you really need a browser to execute queries, select all related resources, save all timers, etc. This will allow you to build waterfall graphics, etc.
If you need to capture this data on the server, you can use pcap to capture the TCP trace and then convert it to HAR , although you still need a client that will actually parse the HTML and request all the sub-resources (pcap just listens in the background) . In addition, you can redirect the browser through a proxy server and let it spit out a HAR file for you.
Last but not least, you can simply control the browser through your debugging interface and export the HAR file in this way. Java example for managing Firefox: https://github.com/Filirom1/browsermob-page-perf
source share