CURL - How to get a page only if it has changed since the last selection?

I have a script that select pages every day, and I want to get it only if the content has changed, so that the script will work faster and less traffic will be used.

My idea is to get the title first and compare the length of the content so that if it is different we retrieve the entire document, but it is not too accurate because the website may have dynamic parts that make the length of the content different each time.

Is there any other way, for example, to use some kind of DNS or something else?

+7
php caching curl web-scraping
source share
3 answers

I searched for an answer for more than two days, and no one could give me a universal answer.

So, I implemented etag and if-modified-since, since the headers (like Matt Raines and sowa posts here), and also to reduce traffic, I used compression, like gzip.

There is also a range of request headers, so that I could only capture part of the page, as someone told me, but I think that it is used only for files, not for web pages.

Thank you all for your time.

+2
source share

Does curl_setopt($curl, CURL_HTTPHEADER, ["If-Modified-Since: 2016-04-30 21:00:00"]); ? I get a 304 Not Modified response on a resource that was last modified earlier this month.

0
source share

Update local file with remote, iff remote - newer

Cut and paste the answer for those who want to check if the remote file is more relevant than the local one, and update the local file if it is :

  // $remotePath = 'http://blahblah.com/file.ext'; // $localPath = '/usr/whatever/app/file.ext'; $headers = get_headers( $remotePath , 1 ); $remote_mod_date = strtotime( $headers['Last-Modified'] ); $local_mod_date = filemtime( $localPath ); if ( $local_mod_date >= $remote_mod_date ) { // Local version up to date } else { // Remote file is newer $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $remotePath); // other options here, eg: curl_setopt($ch, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1_2); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $result = curl_exec($ch); if (curl_errno($ch)) { // handle error : curl_error($ch) } curl_close ($ch); if ( $result ) { // Update local file with remote file contents file_put_contents( $localPath, $result ); } } 

Thanks to the OP question here as well as this answer .
Designed to address the automatic renewal of an OIDC CA certificate ( this , and this ).

0
source share

All Articles