I am looking to write an algorithm for compressing HTML output for CMS, which I write in PHP, written using the CodeIgniter framework.
I was thinking of trying to remove the spaces between any angle brackets other than elements <script>, <pre>and <style>, and just ignoring these elements for simplicity. I must clarify that this is a space between consecutive tags, with no text between them.
How do I go about parsing HTML to find the space I want to remove?
Edit: To get started, I want to remove all tab characters that are not in the tags <pre>. This can be done using regex, I'm sure, but what are the alternatives?
source
share