Is it possible to clear a React site (Instagram) with Cheerio?

I am trying to clear Instagram (built with React) using Node.js / Cheerio. Debugging a document shows the returned object, but it does not look like a typical answer.

I assume this is related to React. Is there a way around this and pulling out the provided DOM for parsing with Cheerio? Or am I missing something?

Thanks in advance.

+7
reactjs web-scraping cheerio
source share
1 answer

In the general case, if the site is optimized for SEO, you can do this by dropping the line of the web crawler user agent. This returns a DOM render that can be analyzed by Cheerio.

In a specific case, Instagram returns a DOM renderer on its mobile websites. Combine the mobile phone user agent string and you can analyze the returned data.

var options = { url: user.instagram_url, headers: { 'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4' } }; request(options, function(error, response, html) { if (!error) { console.log('Scraper running on Instagram user page.'); // Use Cheerio to load the page. var $ = cheerio.load(html); // Code to parse the DOM here } } 
+7
source share

All Articles