Why can Perl use http sites using TOR but not https?

I find it difficult to use perl to visit a website via TOR if it is an https site but not an http site.

#!/usr/bin/perl use strict; use WWW::Mechanize; use LWP::Protocol::socks; use LWP::Protocol::https; use utf8; my $mech = WWW::Mechanize->new(timeout => 60*5); $mech->proxy(['http', 'https'], 'socks://localhost:9150'); $mech->get("https://www.google.com"); 

I get the error message: GETing error https://www.google.com : Status reading error: file file descriptor on line 10 ", where line i10 is the last line of the program.

In the TOR browser, I can successfully view: " https://www.google.com " with port 9150. I am using ActivePerl 5.16.2; Wadalia 0.2.21 and Tor 0.2.3.25. I have a Windows machine and my main internet browser is Mozilla.

I tried to install packages using the commands:

 cpan LWP::UserAgent ppm install LWP::Protocol::https cpan LWP::Protocol::https ppm install LWP::Protocol::socks cpan LWP::Protocol::socks ppm install Mozilla::CA ppm install IO::Socket::SSL ppm install Crypt::SSLeay cpan Crypt::SSLeay 

Thanks for any help! Please let me know if there is any additional information that I can provide.

+6
source share
3 answers

Since ancient times, I have found a way to go through https sites with Tor using WWW :: Curl :: Easy to get such sites, because with LWP I found the same problems. After that I save all the html in files and parse them using WWW :: Mechanzie or HTML :: TreeBuilder.

If you want more interactivity with a site like email forms, etc. These solutions can be more tedious because you will need to interact with curl.

package Curl; use warnings; use WWW::Curl::Easy; use WWW::UserAgent::Random; my $curl = WWW::Curl::Easy->new; my $useragent = rand_ua("browsers"); my $host = 'localhost'; my $port = '9070'; my $timeout = '20'; my $connectTimeOut= '20'; &init; sub get { my $url = shift; $curl->setopt(CURLOPT_URL, $url); my $response_body; $curl->setopt(CURLOPT_WRITEDATA,\$response_body); my $retcode = $curl->perform; if ($retcode == 0) { print("Transfer went ok Http::Code = ".$curl->strerror($retcode)."\n"); my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE); # judge result and next action based on $response_code return \$response_body; } else { # Error code, type of error, error message print("An error happened: $retcode ".$curl->strerror($retcode)." ".$curl->errbuf."\n"); return 0; } } sub init { #setejem el proxy $curl->setopt(CURLOPT_PROXY,"$host:".$port); $curl->setopt(CURLOPT_PROXYTYPE,CURLPROXY_SOCKS4); #posem les altres dades $curl->setopt(CURLOPT_USERAGENT, $useragent); $curl->setopt(CURLOPT_CONNECTTIMEOUT, $connectTimeOut); $curl->setopt(CURLOPT_TIMEOUT, $timeout); $curl->setopt(CURLOPT_SSL_VERIFYPEER,0); $curl->setopt(CURLOPT_HEADER,0); } 

Hope this helps you!

+2
source

The proxy server you are using may already be an HTTPS proxy server (i.e. CONNECT proxy server). In this case, this should work (unchecked):

 #!/usr/bin/perl use strict; use WWW::Mechanize; use LWP::Protocol::socks; use LWP::Protocol::https; use utf8; my $mech = WWW::Mechanize->new(timeout => 60*5); $mech->proxy(['http'], 'socks://localhost:9150'); $mech->proxy(['https'], 'https://localhost:9150'); ### <-- make https go over https-connect proxy $mech->get("https://www.google.com"); 
+1
source

I cannot find the origin, but I have struggled with this for so long. The problem was mainly due to the fact that LWP :: UserAgent is used for https requests.

Perhaps this question may help you: How to force LWP to use Crypt :: SSLeay for HTTPS requests?

+1
source

All Articles