Is there a way to find all links in a specific div using Mechanize?
I tried to use find_all_links but could not find a way to get through this. eg,
<div class="sometag"> <ul class"tags"> <li><a href="/a.html">A</a></li> <li><a href="/b.html">B</a></li> </ul> </div>
A useful tool for capturing useful information from HTML files is HTML :: Grabber . It uses jQuery syntax style to reference elements in HTML, so you can do something like this:
use HTML::Grabber; # Your mechanize stuff here ... my $dom = HTML::Grabber->new( html => $mech->content ); my @links; $dom->find('div.sometag a')->each(sub { push @links, $_->attr('href'); });
Web :: Scraper is useful for cleaning.
use strict; use warnings; use WWW::Mechanize; use Web::Scraper; my $mech = WWW::Mechanize->new; $mech->env_proxy; # If you want to login, do it with mechanize. my $staff = scrape { process 'div.sometag li.tags a', 'links[]' => '@href' }; # pass mechanize to scraper as useragent. $staff->user_agent($mech); my $res = $staff->scrape( URI->new("http://example.com/") ); for my $link (@{$res->{links}}) { warn $link; }
Sorry, I have not tested this code.