PHP cURL: get the redirect target without following it

The curl_getinfo function returns a lot of metadata about the result of an HTTP request. However, for some reason, it does not include the bit of information that I want at the moment, which is the destination URL if the request returns an HTTP redirect code.

I do not use CURLOPT_FOLLOWLOCATION because I want to treat specific forwarding codes as special cases.

If cURL can follow forwarding, why can't it tell me what they are redirected to when it does not follow them?

Of course, I can set the CURLOPT_HEADER flag and select the Location header. But is there a more efficient way?

+6
php curl
source share
5 answers

This can be done in 4 easy steps:

Step 1. Initializing the Curl

curl_init($ch); //initialise the curl handle //COOKIESESSION is optional, use if you want to keep cookies in memory curl_setopt($this->ch, CURLOPT_COOKIESESSION, true); 

Step 2. Get the headers for $url

 curl_setopt($ch, CURLOPT_URL, $url); //specify your URL curl_setopt($ch, CURLOPT_HEADER, true); //include headers in http data curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //don't follow redirects $http_data = curl_exec($ch); //hit the $url $curl_info = curl_getinfo($ch); $headers = substr($http_data, 0, $curl_info['header_size']); //split out header 

Step 3. Check if you have the correct response code

 if (!($curl_info['http_code']>299 && $curl_info['http_code']<309)) { //return, echo, die, whatever you like return 'Error - http code'.curl_info['http_code'].' received.'; } 

Step 4. Separate the headers to get a new URL

 preg_match("!\r\n(?:Location|URI): *(.*?) *\r\n!", $headers, $matches); $url = $matches[1]; 

Once you have a new URL, you can repeat steps 2-4 as often as you like.

+4
source share

curl doesn't seem to have a function or option to get the redirect target; it can be extracted using various methods:

From the answer :

Apache can respond with an HTML page in case of a 301 redirect (it seems that this is not the case with the 302nd).

If the answer has a format similar to:

 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>301 Moved Permanently</title> </head><body> <h1>Moved Permanently</h1> <p>The document has moved <a href="http://www.xxx.yyy/zzz">here</a>.</p> <hr> <address>Apache/2.2.16 (Debian) Server at www.xxx.yyy Port 80</address> </body></html> 

You can extract the redirect URL using DOMXPath :

 $i = 0; foreach($urls as $url) { if(substr($url,0,4) == "http") { $c = curl_init($url); curl_setopt($c, CURLOPT_RETURNTRANSFER, true); $result = @curl_exec($c); $status = curl_getinfo($c,CURLINFO_HTTP_CODE); curl_close($c); $results[$i]['code'] = $status; $results[$i]['url'] = $url; if($status === 301) { $xml = new DOMDocument(); $xml->loadHTML($result); $xpath = new DOMXPath($xml); $href = $xpath->query("//*[@href]")->item(0); $results[$i]['target'] = $href->attributes->getNamedItem('href')->nodeValue; } $i++; } } 

Using CURLOPT_NOBODY

However, there is a faster way, as @ gAMBOOKa points out; Using CURLOPT_NOBODY . This approach simply sends a HEAD request instead of a GET (without loading the actual content, so it should be faster and more efficient) and saves the response header.

Using a regular expression, the destination URL can be extracted from the header:

 foreach($urls as $url) { if(substr($url,0,4) == "http") { $c = curl_init($url); curl_setopt($c, CURLOPT_RETURNTRANSFER, true); curl_setopt($c, CURLOPT_NOBODY,true); curl_setopt($c, CURLOPT_HEADER, true); $result = @curl_exec($c); $status = curl_getinfo($c,CURLINFO_HTTP_CODE); curl_close($c); $results[$i]['code'] = $status; $results[$i]['url'] = $url; if($status === 301 || $status === 302) { preg_match("@https?://([-\w\.]+)+(:\d+)?(/([\w/_\-\.]*(\?\S+)?)?) ?@ ",$result,$m); $results[$i]['target'] = $m[0]; } $i++; } } 
+2
source share

You can just use it: (CURLINFO_REDIRECT_URL)

 $info = curl_getinfo($ch, CURLINFO_REDIRECT_URL); echo $info; // the redirect URL without following it 

as you mentioned, turn off the CURLOPT_FOLLOWLOCATION parameter (before executing) and put my code after executing.

CURLINFO_REDIRECT_URL - with the option CURLOPT_FOLLOWLOCATION disabled: the redirect URL found in the last transaction, which must then be manually requested. With the option CURLOPT_FOLLOWLOCATION enabled: this is empty. The redirect URL in this case is available at CURLINFO_EFFECTIVE_URL

Renouncement

+1
source share

There is no more efficient way
You can use CURLOPT_WRITEHEADER + VariableStream
So you can write the variable headers and parse it

0
source share

I had the same problem and curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); was with any help.

So, I decided not to use CURL , but file_get_contents instead:

 $data = file_get_contents($url); $data = str_replace("<meta http-equiv=\"Refresh\" content=\"0;","<meta",$data); 

The last line helped me block redirection, although the product is not pure html code.

I parsed the data and could get the redirect URL that I wanted to get.

0
source share

All Articles