How to parse HTML to minimize in PHP?

I am looking to write an algorithm for compressing HTML output for CMS, which I write in PHP, written using the CodeIgniter framework.

I was thinking of trying to remove the spaces between any angle brackets other than elements <script>, <pre>and <style>, and just ignoring these elements for simplicity. I must clarify that this is a space between consecutive tags, with no text between them.

How do I go about parsing HTML to find the space I want to remove?

Edit: To get started, I want to remove all tab characters that are not in the tags <pre>. This can be done using regex, I'm sure, but what are the alternatives?

+5
source share
2 answers

Is there something wrong with existing solutions to minimize HTML?

Minify does HTML (as well as CSS and JS).

(The second link refers to the source code that comments on the steps taken - should be a good foot if you want to create your own - it is BSD License .)

Also, as Pete says , you will gain much more by using gzip compression for your HTML (and CSS / JS / etc) and not work with problems like Gordon mentioned in his comment.

+4
source

not to do. The space is negligible. It is better to use output compression, with zlib or here, for example

+7
source

All Articles