This should work, but it is messy and it is possible that it will break if the site you are cleaning changes its layout, which will affect curettage:
$sites[0] = 'http://www.traileraddict.com/'; // use this if you want to retrieve more than one page: // $sites[1] = 'http://www.traileraddict.com/trailers/2'; foreach ($sites as $site) { $ch = curl_init($site); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $html = curl_exec($ch); // ok, you have the whole page in the $html variable // now you need to find the common div that contains all the review info // and that appears to be <div class="info"> (I think you could use abstract aswell) $title_start = '<div class="info">'; $parts = explode($title_start,$html); // now you have an array of the info divs on the page foreach($parts as $part){ // so now you just need to get your title and link from each part $link = explode('<a href="/trailer/', $part); // this means you now have part of the trailer url, you just need to cut off the end which you don't need: $link = explode('">', $link[1]); // this should give something of the form: // overnight-2012/trailer // so just make an absolute url out of it: $url = 'http://www.traileraddict.com/trailer/'.$link[0]; // now for the title we need to follow a similar process: $title = explode('<h2>', $part); $title = explode('</h2>', $title[1]); $title = strip_tags($title[0]); // INSERT DB CODE HERE eg $db_conn = mysql_connect('$host', '$user', '$password') or die('error'); mysql_select_db('$database', $db_conn) or die(mysql_error()); $sql = "INSERT INTO trailers(url, title) VALUES ('".$url."', '".$title."')" mysql_query($sql) or die(mysql_error()); }
That should be so, now you have a variable for the link and a title that you can insert into your database.
RENOUNCEMENT
I wrote it upside down, so I apologize if it does not work right off the bat, but let me know if it is not, and I will try to help further.
ALSO, I know this can be done smarter and use fewer steps, but it will require more thought on my part, and the OP can do it if they want, as soon as they understand the code that I wrote, since I would have guessed it it would be much more important for them to understand what I did and be able to edit them themselves.
In addition, I would advise scrubbing the site at night so as not to burden it with additional traffic, and I would suggest asking permission from this site, since if they catch you, they can put an end to your scraping: (
To answer your last question - to run this in a given period of time, you should use the cron task.