I need to do a fairly extensive project using web scraping, and I am considering using Hpricot or Beautiful Soup (i.e. Ruby or Python). Has anyone come across a tutorial that they thought was particularly good at this, which would help me launch a project on my right foot?
ScrAPI has a great episode of Railscasts .
- Python: Scrapy Mechanize. .
, , - , Webbots, Spiders Screen Scrapers.
: , - . , , , . , , . Theres , , , , .
, -, PHP. , , .
lxml BeautifulSoup. , HTML. , , BeautifulSoup, "" HTML , BeautifulSoup ( - lxml ). API BeautifulSoup, API- lxml.
Ian Blicking .
BeautifulSoup , Google App Engine - , , Python.
:
, , :
Ruby Scrubyt -. , , - .