<tbody> crashes in PHP Simple HTML DOM parser

I use PHP Simple HTML DOM Parser to clear some data in the online store (XAMPP 1.7.2 with PHP5.3.0 also works) and I am having problems with the <tbody> . The structure of the table is very important (the details are not so important):

 <table> <thead> <!--text here--> </thead> <tbody> <!--text here--> </tbody> </table> 

Now I am trying to go to the <tbody> section using the code:

 $element = $html->find('tbody',0)->innertext; 

It doesn't throw any errors, it just doesn't print anything when I try to repeat it. I tested the code on other elements, <thead> , <table> , even something like <span class="price"> , and they all work fine (of course, deleting ", 0" does not give the code). All of them give their correct sections. The outer text. But all this does not happen on <tbody> .

Now I was looking through the Parser, but I'm not sure I can figure it out. I noticed that <thead> is not even mentioned, but it works fine. shrug

I think I could try navigating with the kids, but that will work too. I just tried running:

 $el = $html->find('table',0); $el2 = $el->children(2); echo $el2->outertext; 

and without cubes. I tried replacing children with first_child and 2 with 1, and still didn't play dice. It's funny if I try ->find instead of children , it works fine.

I'm pretty sure I can find a job around everything, but this behavior seems rather strange to post here. My curious mind is happy for all the help he can get.

+6
php simple-html-dom
source share
4 answers

in simple_html_dom.php file or delete line # 396

 // if ($m[1]==='tbody') continue; 
+25
source share

There is an error report here: http://sourceforge.net/p/simplehtmldom/bugs/79/

It is still open at the time of this writing. There is an alternative solution if you do not want to change the source code, for example, in a loop to find <tr>

 <?php // The *BROKEN* way to find the <tr> // below the <tbody> below the <table id="foo"> foreach($dom->find('tbl#foo tbody tr') as $tr) { /* you will get nothing */ } 

Instead, you can selectively check the name of the parent tag when repeating all <tr> like this:

 <?php // A workaround to find the <tr> // below the <tbody> below the <table id="foo"> foreach($dom->find('tbl#foo tr') as $tr) { // note the lack of tbody selector /* you will get all trs, but let only work with ones with the parent of a tbody! */ if($tr->parent->tag == 'tbody') { // our workaround /* this part will work as you would expect the above broken code to work */ } } 

Also note that I'm a bit of an unrelated question that I came across that the Chrome and FF inspectors will fix the soup with tags regarding <tbody> and <thead> . Be careful - just look at the actual source - stay away from DOM inspectors if you run into unexplained problems.

+2
source share

Make sure your tbody comes from javascript execution. I ran into the same problem with the span tag. Later, I discovered that if any html code hits the page through jquery / any other javascript execution, then in this case simple_html_dom just fails.

+1
source share

Make sure tbody really is. Many browsers will add tbody to tables in the validation panel, even if they are not present in the response.

+1
source share

All Articles