Get content in html tag with php and replace it after processing

I have html (sample.html) like this:

<html> <head> </head> <body> <div id="content"> <!--content--> <p>some content</p> <!--content--> </div> </body> </html> 

How to get the part of content that is between 2 html comments '<!--content-->' using php? I want to get this, do some processing and place it back, so I have to get and put it! Is it possible?

+4
source share
5 answers

esafwan - you can use a regex expression to extract content between a div (specific identifier).

I already did this for image tags, so the same rules apply. I will watch the code and update the message a bit.

[update] try the following:

 <?php function get_tag( $attr, $value, $xml ) { $attr = preg_quote($attr); $value = preg_quote($value); $tag_regex = '/<div[^>]*'.$attr.'="'.$value.'">(.*?)<\\/div>/si'; preg_match($tag_regex, $xml, $matches); return $matches[1]; } $yourentirehtml = file_get_contents("test.html"); $extract = get_tag('id', 'content', $yourentirehtml); echo $extract; ?> 

or more simply:

 preg_match("/<div[^>]*id=\"content\">(.*?)<\\/div>/si", $text, $match); $content = $match[1]; 

Jim

+14
source

If this is a simple replacement that does not include parsing the actual HTML document, you can use a regular expression or even just str_replace . But usually it is not recommended to use Regex for HTML , because HTML is not regular and coming with reliable templates can quickly become a nightmare .

The right way to parse HTML in PHP is to use a parsing library that actually knows how to understand HTML documents. Your best native bet would be the DOM , but PHP has a number of other native XML extensions that you can use, and there are also a number of third-party libraries such as phpQuery , Zend_Dom , QueryPath, and FluentDom .

If you use the search function, you will see that this section has been widely covered , and you should not have problems finding examples that show how to solve your question.

+8
source
 <?php $content=file_get_contents("sample.html"); $comment=explode("<!--content-->",$content); $comment=explode("<!--content-->",$comment[1]); var_dump(strip_tags($comment[0])); ?> 

check it, it will work for you

+2
source

Check out the sample code here, which means you can load the HTML document into SimpleXML http://blog.charlvn.com/2009/03/html-in-php-simplexml.html

Then you can think of it as a regular SimpleXML object.

EDIT: this will only work if you want the content in the tag (e.g. between <div> and </div>)

+1
source

Problem with nested divs I found a solution here

 <?php // File: MatchAllDivMain.php // Read html file to be processed into $data variable $data = file_get_contents('test.html'); // Commented regex to extract contents from <div class="main">contents</div> // where "contents" may contain nested <div>s. // Regex uses PCRE recursive (?1) sub expression syntax to recurs group 1 $pattern_long = '{ # recursive regex to capture contents of "main" DIV <div\s+class="main"\s*> # match the "main" class DIV opening tag ( # capture "main" DIV contents into $1 (?: # non-cap group for nesting * quantifier (?: (?!<div[^>]*>|</div>). )++ # possessively match all non-DIV tag chars | # or <div[^>]*>(?1)</div> # recursively match nested <div>xyz</div> )* # loop however deep as necessary ) # end group 1 capture </div> # match the "main" class DIV closing tag }six'; // single-line (dot matches all), ignore case and free spacing modes ON // short version of same regex $pattern_short = '{<div\s+class="main"\s*>((?:(?:(?!<div[^>]*>|</div>).)++|<div[^>]*>(? 1)</div>)*)</div>}si'; $matchcount = preg_match_all($pattern_long, $data, $matches); // $matchcount = preg_match_all($pattern_short, $data, $matches); echo("<pre>\n"); if ($matchcount > 0) { echo("$matchcount matches found.\n"); // print_r($matches); for($i = 0; $i < $matchcount; $i++) { echo("\nMatch #" . ($i + 1) . ":\n"); echo($matches[1][$i]); // print 1st capture group for match number i } } else { echo('No matches'); } echo("\n</pre>"); ?> 
+1
source

All Articles