How to navigate up a node in an html tree and retrieve a link?

I know that the title of the question is not so descriptive, but let me explain here.

I am trying to parse a given html document using HTML :: TreeBuilder. Now in this html document the values 5,1,ABC,DEF should be checked for compliance with the value provided by the user, and if this check is successful, I need to extract the href link.

So my code is:

 my @tag = $tree->look_down( _tag => 'tr', class => qr{\bepeven\scompleted\b} ); for (@tag) { query_element($_); } sub query_element { my @td_tag = $_[0]->look_down( _tag => 'td' ); my $num1 = shift @td_tag; #Get the first td tag my $num2 = shift @td_tag; # Get the second td tag #Making sure first/second td tag has numeric value $num1 = $1 if $num1->as_text =~ m!(\d+)! or die "no match found"; $num2 = $1 if $num2->as_text =~ m!(\d+)! or die "no match found"; #Validating that above value match the user provided value 5 and 1. if ( $num1 eq '5' && $num2 eq '1' ) { say "hurray..!!"; #Iterating over rest of the td tag to make sure we get the right link from it. for (@td_tag) { #Check if contains ABC and than procede to fetch the download href link. if ($_->look_down(_tag => 'td', class => qr{[c]}, sub { $_[0]->as_text eq 'ABC';} ) ) { my $text = $_->as_text; say "Current node text is: ", $text; #outputs ABC #Now from here how do I get the link I want to extract. } } } } 

Now my approach first extracts the value from td tags and compares it with the value specified by the user, if it is successful, than searching for another value specified by the user ABC or DEF , in my case it is ABC , if it is matched , than only .

Now the containsig tag ABC or DEF has no fixed position, but they will be lower than the tags containing 5 and 1 value. So, I used $_[0]->as_text eq 'ABC'; to verify that the tag contains ABC now in my tree. I am now in text node ABC from here, how can I extract the href i, e link, how can I navigate through the tree of objects and retrieve the value.

PS: I would try xpath here, but the position of the html elements is not so clearly defined and structured.

EDIT:

So, I tried $_->tag() and returned td , but if I am on a td tag, than to why the following code does not work:

 my $link_obj = $_->look_down(_tag => 'a') # It should look for `a` tag. say $link_obj->as_text; 

But this leads to the following error:

 Can't call method "as_text" on an undefined value. 
+7
source share
3 answers

I hope the following (using my own Marpa :: R2 :: HTML) is helpful. Note that the HTML :: TreeBuilder response finds only one answer. The code below is two that I think were intent.

 #!perl use Marpa::R2::HTML qw(html); use 5.010; use strict; use warnings; my $answer = html( ( \join q{}, <DATA> ), { td => sub { return Marpa::R2::HTML::contents() }, a => sub { my $href = Marpa::R2::HTML::attributes()->{href}; return undef if not defined $href; return [ link => $href ]; }, 'td.c' => sub { my @values = @{ Marpa::R2::HTML::values() }; if ( ref $values[0] eq 'ARRAY' ) { return $values[0] } return [ test => 'OK' ] if Marpa::R2::HTML::contents eq 'ABC'; return [ test => 'OK' ] if Marpa::R2::HTML::contents eq 'DEF'; return [ test => '' ]; }, tr => sub { my @cells = @{ Marpa::R2::HTML::values() }; return undef if shift @cells != 5; return undef if shift @cells != 1; my $ok = 0; my $link; for my $cell (@cells) { my ( $type, $value ) = @{$cell}; $ok = 1 if $type eq 'test' and $value eq 'OK'; $link = $value if $type eq 'link'; } return $link if $ok; return undef; }, ':TOP' => sub { return Marpa::R2::HTML::values(); } } ); die "No parse" if not defined $answer; say join "\n", @{$answer}; __DATA__ <table> <tbody> <tr class="epeven completed"> <td>5</td> <td>1</td> <td class="c">ABC</td> <td class="c">satus</td> <td class="c"><a href="/path/link">Download</a></td> </tr> <tr class="epeven completed"> <td>5</td> <td>1</td> <td class="c">status</td> <td class="c">DEF</td> <td class="c"><a href="/path2/link">Download</a></td> </tr> </table> 
+4
source

I'm not sure I understand what you want to do, but something like that? Use look_down to describe what you want; there is no need to try tree navigation; that will be fragile.

 use strict; use warnings; use HTML::TreeBuilder 5 -weak; use 5.014; my $tree = HTML::TreeBuilder->new_from_content(<DATA>); for my $e ($tree->look_down( _tag => 'a', sub { my $e = $_[0]; my $tr = $e->parent->parent; ### Could also use ->lineage to search up through the ### containing elements return unless $tr->attr('_tag') eq 'tr' and $tr->attr('class') eq 'epeven completed'; return ( $tr->look_down( _tag => 'td', sub { $_[0]->as_text eq '1'; }) and $tr->look_down( _tag => 'td', sub { $_[0]->as_text eq '5'; }) and $tr->look_down( _tag => 'td', class => 'c', sub { $_[0]->as_text eq 'ABC'; }) ); } ) ) { say $e->attr('href'); } __DATA__ <table> <tbody> <tr class="epeven completed"> <td>5</td> <td>1</td> <td class="c">ABC</td> <td class="c">satus</td> <td class="c"><a href="/path/link">Download</a></td> </tr> <tr class="epeven completed"> <td>5</td> <td>1</td> <td class="c">status</td> <td class="c">DEF</td> <td class="c"><a href="/path2/link">Download</a></td> </tr> </table> 

Output:

 /path/link 
+2
source

If you can opt out of HTML :: TreeBuilder, you can parse the following:

 for my $r ($content =~ m{<tr class="epeven completed">(.*?)</tr>}gs) { my ($n1, $n2) = $r =~ m{<td>(\d+)</td>\s*<td>(\d+)</td>}g; next if $n1 != 5 || $n2 != 1; next if $r !~ m{<td class="c">ABC</td>}g; my ($link) = $r =~ m{<a href="(.*?)">Download</a>}g; say $link; } 
0
source

All Articles