Change the html text to take into account "full stops" (in CachePHP TextHelper-> truncate)
Edit:
I ended up using CakePHP truncate() . It is much faster and supports unicode: D
But the question remains:
How can I make a function automatically detect complete stops ( . ) And cut it out right after that? Thus, in principle, $length will be ignored. Therefore, if the new text has an incomplete sentence, more words will be added until the sentence ends (or deleted, depending on the length of the line from clipping to the next / previous sentence)
Edit 2: I learned how to detect complete stops. I replaced:
if (!$exact) { $spacepos = mb_strrpos($truncate, ' '); ... with
if (!$exact) { $spacepos = mb_strrpos($truncate, '.'); ... change - problem:
When I have img tags that have dots inside their attributes, the text is truncated inside the tag:
$text = '<p>Abc def abc def abc def abc def. Abc def <img src="test.jpg" /></p><p>abc def abc def abc def abc def.</p>'; echo htmlentities(truncate($text)); How can i fix this? I will reveal generosity because the original question has already been given ...
This snippet resolves what you are looking for and lists its failures (complete stops may not indicate the end of a sentence, and other punctuation may end a sentence).
It scans characters up to $maxLen and then effectively "drops" the partial sentence after the last complete stop that it finds.
In your case, you should use this function immediately before returning $new_text .
To fix the "full stop in tag" problem, you can use something similar to the following to determine if the stop is in the tag:
$str_len = strlen($summary); $pos_stop = strrpos($summary, '.'); $pos_tag_open = strrpos($summary, '<', -($str_len - $pos_stop)); $pos_tag_close = strpos($summary, '>', $pos_tag_open); if (($pos_tag_open < $pos_stop) && ($pos_stop < $pos_tag_close)) { // Inside tag! Search for the next nearest prior full-stop. $pos_stop = strrpos($summary, '.', -($str_len - $pos_tag_open)); } echo htmlentities(substr($summary, 0, $pos_stop + 1)); Obviously, this code can be optimized (and pulled into its own function), but you get this idea. I feel that there is a regular expression that can handle this a little more efficiently.
Edit:
Indeed, there is a regular expression that can do this using a negative lookup:
$text = '<p>Abc def abc def abc def abc def. Abc def <img src="test.jpg" />abc</p>'; $count = preg_match_all("/\.(?!([^<]+)?>)/", $text, $arr, PREG_OFFSET_CAPTURE); $offset = $arr[0][$count-1][1]; echo substr($text, 0, $offset + 1)."\n"; This should be relatively efficient, at least compared to truncate() , which also uses preg_match internally.
The regular expression above Trim HTML text when accounting for "full stops" (in CachePHP TextHelper-> truncation) may work.
But, considering efficiency, in this case we can first trim the string to max_length , and then make preg in the truncated string. And yes, considerations must be made with punctuation characters.
Some other rules will create the correct logic to determine the end of a sentence.
- space or EOL after punctuation character
- The first word after the selected punctuation is uppercase.
- A few new lines (end of paragraph) after the punctuation character, etc.