Scraper using Inspect

Question

Scraper using Inspect

I am trying to get some information from Instagram by clearing it. I tried this code on Twitter and it worked fine, but it does not show the result on Instagram, both codes are available here.

Twitter Code:

from bs4 import BeautifulSoup
from urllib2 import urlopen
theurl = "https://twitter.com/realmadrid"
thepage = urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")
print(soup.find('div',{"class":"ProfileHeaderCard"}))

Result: Great.

Instagram Code:

from bs4 import BeautifulSoup
from urllib2 import urlopen
theurl = "https://www.instagram.com/barackobama/"
thepage = urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")
print(soup.find('div',{"class":"_bugdy"}))

Result: No

+4

python twitter screen-scraping beautifulsoup instagram

Ravi Jun 16 '16 at 9:03

source share

2 answers

Padraic Cunningham · Answer 1 · 2016-06-16T11:17:58+0000

If you look at the source, you will see that the content is dynamically loaded, so there isn’t div._bugdywhat is returned by your request, depending on what you want, you can extract it from json script:

import requests
import re
import json

r = requests.get("https://www.instagram.com/barackobama/")
soup = BeautifulSoup(r.content)
js = soup.find("script",text=re.compile("window._sharedData")).text
_json = json.loads((js[js.find("{"):js.rfind("}")+1]))
from pprint import pprint as pp

pp(_json)

This gives you everything that you see in <script type="text/javascript">window._sharedData = .....the returned source.

, - selenium, , , , , , , :

from selenium import webdriver
import time
login = "https://www.instagram.com"
dr = webdriver.Chrome()

dr.get(login)

dr.find_element_by_xpath("//a[@class='_k6cv7']").click()
dr.find_element_by_xpath("//input[@name='username']").send_keys(youruname")
dr.find_element_by_xpath("//input[@name='password']").send_keys("yourpass")
dr.find_element_by_css_selector("button._aj7mu._taytv._ki5uo._o0442").click()
time.sleep(5)
dr.get("https://www.instagram.com/barackobama")

dr.find_element_by_css_selector('a[href="/barackobama/followers/"]').click()
time.sleep(3)
for li in dr.find_element_by_css_selector("div._n3cp9._qjr85").find_elements_by_xpath("//ul/li"):
    print(li.text)

li, , , , :

I.B. · Answer 2 · 2016-06-16T11:14:43+0000

, 3 .

from bs4 import BeautifulSoup
from urllib2 import urlopen
theurl = "https://www.instagram.com/barackobama/"
thepage = urlopen(theurl)
soup = BeautifulSoup(thepage,"html.parser")
print(soup.find('div',{"class":"_bugdy"}))

-, , Python , . , -, Selenium webdriver (http://www.seleniumhq.org/projects/webdriver/) PhantomJS (http://phantomjs.org/), , Javascript / , .

Scraper using Inspect

More articles: