How to clean certain data from scratches with a simple html dom analyzer

Question

How to clean certain data from scratches with a simple html dom analyzer

I am trying to clear data from a web page, but I need to get all the data in this link .

include 'simple_html_dom.php'; $html1 = file_get_html('http://www.aktive-buergerschaft.de/buergerstiftungen/unsere_leistungen/buergerstiftungsfinder'); $info1 = $html1->find('b[class=[what to enter herer ]',0);

I need to get all the data from this site .

 Bürgerstiftung Lebensraum Aachen rechtsfähige Stiftung des bürgerlichen Rechts Ansprechpartner: Hubert Schramm Alexanderstr. 69/ 71 52062 Aachen Telefon: 0241 - 4500130 Telefax: 0241 - 4500131 Email: info@buergerstiftung-aachen.de www.buergerstiftung-aachen.de >> Weitere Details zu dieser Stiftung Bürgerstiftung Achim rechtsfähige Stiftung des bürgerlichen Rechts Ansprechpartner: Helga Kühn Rotkehlchenstr. 72 28832 Achim Telefon: 04202-84981 Telefax: 04202-955210 Email: info@buergerstiftung-achim.de www.buergerstiftung-achim.de >> Weitere Details zu dieser Stiftung

I need to have data that is "behind" the link - is there a way to do this with a simple and understandable parser that can be understood and written by a beginner !?

+7

variables html php parsing

zero May 24, '11 at 17:26

source share

6 answers

Your links provided do not work, I suggest you use your own PHP " DOM " extension instead of a "simple html parser", it will be much faster and easier; ) I viewed the page using googlecache, you can use something like: -

 $doc = new DOMDocument; @$doc->loadHTMLFile('...URL....'); // Using the @ operator to hide parse errors $contents = $doc->getElementById('content')->nodeValue; // Text contents of #content

+7

Salman abbas May 28 '11 at 6:30

source share

From what I can take a quick look, you need to go through the <dl> tags in #content, then dt and dd.

 foreach ($html->find('#content dl') as $item) { $info = $item->find('dd'); foreach ($info as $info_item) {..} }

Using the simple_html_dom library

+2

Mick hansen May 26, '11 at 20:32

source share

XPath makes scraping ridiculously easy and allows some changes to the HTML document to not affect you. For example, to display the names, you should use a query that looks like this:

 //div[id='content']/d1/dt

A simple Google search will give you many lessons.

+1

Nicolas Jun 2 '11 at 16:14

source share

@zero: there is a good site to try to break the site using both php and python ... a pretty useful site at least for me: - http://scraperwiki.com/

+1

ag112 Jun 2 '11 at 17:49

source share

I would use WWW: Mechanize

http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm

-one

Eamorr Jun 2 '11 at 1:26

source share

Felix kling · Accepted Answer · 2011-05-24T17:29:32+0000

It seems to be written in the documentation :

 $html1->find('b[class=info]',0)->innertext;

How to clean certain data from scratches with a simple html dom analyzer

More articles: