How to use cURL to extract certain data from a website and then save it in my database using php

can someone tell me how to use curl or file_get_contents to download certain data from a website and then save that specific data in my mysql database. I want to get the latest movie uploads from this site http://www.traileraddict.com/ , and I want to save it in my database (daily, this text and html link will be shown on my site). I just need the text and the html link. (Highlighted in Fig.)

enter image description here

I searched everywhere, but I did not find a useful tutorial. I have two main questions:

1) How to get specific data using cURL or file_get_contents.

2) How to save specific content in my mysql database table (text in one column and link in another column)

+7
source share
2 answers

Using cURL:

$ch = curl_init(); curl_setopt( $ch, CURLOPT_URL, 'http://www.something.com'); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true); $content = curl_exec($ch); 

Then you can load the element into the DOM object and parse the dom for specific data. You can also try and parse the data using search strings, but using regular expressions in HTML is greatly underestimated.

 $dom = new DOMDocument(); $dom->loadHTML( $content ); // Parse the dom for your desired content 
+13
source

This should work, but it is messy and it is possible that it will break if the site you are cleaning changes its layout, which will affect curettage:

 $sites[0] = 'http://www.traileraddict.com/'; // use this if you want to retrieve more than one page: // $sites[1] = 'http://www.traileraddict.com/trailers/2'; foreach ($sites as $site) { $ch = curl_init($site); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $html = curl_exec($ch); // ok, you have the whole page in the $html variable // now you need to find the common div that contains all the review info // and that appears to be <div class="info"> (I think you could use abstract aswell) $title_start = '<div class="info">'; $parts = explode($title_start,$html); // now you have an array of the info divs on the page foreach($parts as $part){ // so now you just need to get your title and link from each part $link = explode('<a href="/trailer/', $part); // this means you now have part of the trailer url, you just need to cut off the end which you don't need: $link = explode('">', $link[1]); // this should give something of the form: // overnight-2012/trailer // so just make an absolute url out of it: $url = 'http://www.traileraddict.com/trailer/'.$link[0]; // now for the title we need to follow a similar process: $title = explode('<h2>', $part); $title = explode('</h2>', $title[1]); $title = strip_tags($title[0]); // INSERT DB CODE HERE eg $db_conn = mysql_connect('$host', '$user', '$password') or die('error'); mysql_select_db('$database', $db_conn) or die(mysql_error()); $sql = "INSERT INTO trailers(url, title) VALUES ('".$url."', '".$title."')" mysql_query($sql) or die(mysql_error()); } 

That should be so, now you have a variable for the link and a title that you can insert into your database.

RENOUNCEMENT

I wrote it upside down, so I apologize if it does not work right off the bat, but let me know if it is not, and I will try to help further.

ALSO, I know this can be done smarter and use fewer steps, but it will require more thought on my part, and the OP can do it if they want, as soon as they understand the code that I wrote, since I would have guessed it it would be much more important for them to understand what I did and be able to edit them themselves.

In addition, I would advise scrubbing the site at night so as not to burden it with additional traffic, and I would suggest asking permission from this site, since if they catch you, they can put an end to your scraping: (

To answer your last question - to run this in a given period of time, you should use the cron task.

+8
source

All Articles