Regex to GENERATE thumbnails!?!?! (but this is crazy!)

So, here is my situation and the solution that I came up with to solve the problem. I created an application that includes TinyMCE so that users can create HTML content for publication. The user can include images in his layout and drag / resize these images, affecting the final Width / Height attributes in the IMG tag. All this is great, users can include images and resize / move them to their desired appearance. But one big problem is that now I am sending the client (possibly) a much larger size, only for the browser to resize the image to the requested Width / Height attributes. All this bandwidth and lost boot time ....

So my solution is to pre-process the content of the markup of my users, scan all the IMG tags and parse the Height / Width / Src attributes. Then set each img SRC tag to the phpThumb request with the parsed height / width passed in the thumbnail URL. This will create my image of a reduced size (optimizing throughput due to processor and caching). What do you think of this decision? I saw other posts in which people used mod_rewrite to do something similar, but I want to influence the content in the page service, and not process image requests as they are received ..... Any thoughts on this design?

I need help with subtle details, as my regular expression skills need some work, but I am very short on time and promise to pay my technical debt soon. To facilitate regular expression, I can be sure of some things. Only img tags that need this processing will have existing width = "height =" "attributes (with double quotes and lower text, but I believe case-sensitive matching would be better if TinyMCE changes)

So, the regular expression matches only the necessary Img tags and maybe three more regular expressions to extract src, width and height?

Thanks to everyone.

+6
regex tinymce thumbnails phpthumb
source share
3 answers

I think using regexes for this is a bad idea and you better parse it using something like PHP Simple HTML DOM Parser , then you can do something like:

// Load HTML from a string $html->load($your_posted_content); // Find all images foreach($html->find('img') as $element) echo $element->src . '<br>'; 
+3
source share

Try the following:

(?i)<img(?>\s+(?>src="([^"]*)"|width="([^"]*)"|height="([^"]*)"|\w+="[^"]*"))+

This will match any image tag, and if the src , width and height attributes are present, their values ​​will be stored in groups 1, 2, and 3, respectively. But it does not require any of these attributes, so you need to check that all three groups contain values ​​before processing.

+1
source share

Generally speaking, RegEx is not suitable for parsing HTML . But in your case, you can get away from it if your restricting area is very narrow (that is, only searching for the attributes width=".." and height=".." .. or something like that).

The best solution would be to transfer content from TinyMCE asynchronously, execute scripts and process it on the server side using the correct HTML / XML parser, and then update the contents of the editor after that.

0
source share

All Articles