Getting company data from Yelp
In this case, we want to get addresses for companies in San Francisco from www.yelp.com.
Site analysis
We can get a list of companies starting with the letter "A" on this page:
http:
This catalog page tells us that for "A" there are 42 pages of results with up to 80 results per page.
It's a good news.
Create API
Now I'm going to create an API to retrieve data from the first page, and then use Bulk Extract to pass a list of URLs to all 42 pages.
Using Magic, I can create an API in just a few clicks:
- Go to Magic.import.io
- Embed the Yelp Page URL (link above)
- Click Extract Data
- Click Get API
- Click "Copy this to" My Details "
Now we have an API!
(Note that if you need more control over what to include or exclude from the API, you can use Extractor)
Create URL List
To create a list of URLs that will allow us to receive data from pages 1 to 42, I am going to use an external service located at:
http://texttool.blogspot.co.uk/
Find the generate list of numbers tool and create a list of URLs:
http://www.yelp.com/sm/san-francisco-ca-us/a/1 http://www.yelp.com/sm/san-francisco-ca-us/a/2 http://www.yelp.com/sm/san-francisco-ca-us/a/3 http://www.yelp.com/sm/san-francisco-ca-us/a/4 http://www.yelp.com/sm/san-francisco-ca-us/a/5 http://www.yelp.com/sm/san-francisco-ca-us/a/6 http://www.yelp.com/sm/san-francisco-ca-us/a/7 http://www.yelp.com/sm/san-francisco-ca-us/a/8 http://www.yelp.com/sm/san-francisco-ca-us/a/9 http://www.yelp.com/sm/san-francisco-ca-us/a/10 http://www.yelp.com/sm/san-francisco-ca-us/a/11 http://www.yelp.com/sm/san-francisco-ca-us/a/12 http://www.yelp.com/sm/san-francisco-ca-us/a/13 http://www.yelp.com/sm/san-francisco-ca-us/a/14 http://www.yelp.com/sm/san-francisco-ca-us/a/15 http://www.yelp.com/sm/san-francisco-ca-us/a/16 http://www.yelp.com/sm/san-francisco-ca-us/a/17 http://www.yelp.com/sm/san-francisco-ca-us/a/18 http://www.yelp.com/sm/san-francisco-ca-us/a/19 http://www.yelp.com/sm/san-francisco-ca-us/a/20 http://www.yelp.com/sm/san-francisco-ca-us/a/21 http://www.yelp.com/sm/san-francisco-ca-us/a/22 http://www.yelp.com/sm/san-francisco-ca-us/a/23 http://www.yelp.com/sm/san-francisco-ca-us/a/24 http://www.yelp.com/sm/san-francisco-ca-us/a/25 http://www.yelp.com/sm/san-francisco-ca-us/a/26 http://www.yelp.com/sm/san-francisco-ca-us/a/27 http://www.yelp.com/sm/san-francisco-ca-us/a/28 http://www.yelp.com/sm/san-francisco-ca-us/a/29 http://www.yelp.com/sm/san-francisco-ca-us/a/30 http://www.yelp.com/sm/san-francisco-ca-us/a/31 http://www.yelp.com/sm/san-francisco-ca-us/a/32 http://www.yelp.com/sm/san-francisco-ca-us/a/33 http://www.yelp.com/sm/san-francisco-ca-us/a/34 http://www.yelp.com/sm/san-francisco-ca-us/a/35 http://www.yelp.com/sm/san-francisco-ca-us/a/36 http://www.yelp.com/sm/san-francisco-ca-us/a/37 http://www.yelp.com/sm/san-francisco-ca-us/a/38 http://www.yelp.com/sm/san-francisco-ca-us/a/39 http://www.yelp.com/sm/san-francisco-ca-us/a/40 http://www.yelp.com/sm/san-francisco-ca-us/a/41 http://www.yelp.com/sm/san-francisco-ca-us/a/42
Bulk extraction
Now you can use Bulk Extract to retrieve data from each of these URLs at a time.
For this:
- Go to the Configuration tab of your Yelp API.
- Select Bulk Retrieval from the drop-down list.
- Paste in a list of 42 URLs
- Click Run Queries
Note. You may receive several failed requests. By clicking on the "X URLs failed" icon, you can retry failed requests.
Export
Now you can export this data to a spreadsheet, like HTML or JSON.
Further reading
http://support.import.io/knowledgebase/articles/669784-getting-company-data-from-yelp
Nick scott
source share