Extracting data from a website through PHP

I am trying to create a simple alert app for some friends.

Basically, I want to be able to retrieve the "price" and "stock availability" data on a web page, for example, as follows:

I made an email notification and sms parts, but now I want to get the quantity and price from web pages (those 2 or any others) so that I can compare the prices and the quantity available and warn us about making an order if the product is between certain thresholds.

I tried some regex (found in some tutorials, but I too n00b too for this) but couldnโ€™t get this work, any good tips or examples?

+7
php regex curl html-parsing
source share
6 answers
$content = file_get_contents('http://www.sparkfun.com/commerce/product_info.php?products_id=9279'); preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match); $price = $match[1]; preg_match('#<input type="hidden" name="quantity_on_hand" value="(.*?)">#', $content, $match); $in_stock = $match[1]; echo "Price: $price - Availability: $in_stock\n"; 
+29
source share

He caused screen squeaking if you need google for him.

I would suggest that you use the dom and xpath parser instead. Feed HTML through HtmlTidy first to make sure that it is valid markup.

For example:

 $html = file_get_contents("http://www.example.com"); $html = tidy_repair_string($html); $doc = new DomDocument(); $doc->loadHtml($html); $xpath = new DomXPath($doc); // Now query the document: foreach ($xpath->query('//table[@class="pricing"]/th') as $node) { echo $node, "\n"; } 
+7
source share

Whatever you do: do not use regular expressions to parse HTML or bad things will happen . Use a parser instead.

+5
source share

You are probably best offloading the HTML into the DOM parser, for example this one and looking for the โ€œpricingโ€ table. However, any scraping you make can be interrupted when they change the page layout, and are probably illegal without their consent.

The best way would be to talk to the people who launched the site and see if they have alternative, more reliable forms of data delivery (web services, RSS, or database export).

+2
source share

1, asking this question, you need to list the details. Secondly, retrieving data from a website may not be practical. However, I have some tips:

  • Use Firebug or Chrome / Safari Inspector to examine HTML content and interesting information template.

  • Check your RegEx to see if they match. You may need to do this many times (multi-pass parsing / extraction)

  • Log the client through cURL or even much easier, use file_get_contents (note that on some hosting, disable the downloading of URLs using file_get_contents)

For me, it's better to use Tidy to convert to valid XHTML, and then use XPath to retrieve data instead of RegEx. What for? Because XHTML is not regular, and XPath is very flexible. You can learn XSLT for conversion.

Good luck

+2
source share

The easiest way to retrieve data from a website. I analyzed that all my data is covered only by a tag, so I prepared this file.

 <?php include('simple_html_dom.php'); // Create DOM from URL, paste your destined web url in $page $page = 'http://facebook4free.com/category/facebookstatus/amazing-facebook-status/'; $html = new simple_html_dom(); //Within $html your webpage will be loaded for further operation $html->load_file($page); // Find all links $links = array(); //Within find() function, I have written h3 so it will simply fetch the content from <h3> tag only. Change as per your requirement. foreach($html->find('h3โ€ฒ) as $element) { $links[] = $element; } reset($links); //$out will be having each of HTML element content you searching for, within that web page foreach ($links as $out) { echo $out; } ?> 
0
source share

All Articles