How is pure HTML different from source code?

Question

How is pure HTML different from source code?

I clear the list of restaurants from the site (with permission) and I have a problem. The html python scrapes from the site is different from the html in the source code. Less than half of the restaurants on their site are in html in python. This is what my code looks like:

import requests
from bs4 import BeautifulSoup
from tempfile import TemporaryFile
import xlwt

url = 'https://www.example.com'

r = requests.get(url)
data = BeautifulSoup(r.text)
soup = data.find_all('span',{'class':'restaurant_name'})
print soup

Now I know that this is inefficient, but I can not show html, because the company will not allow me. I am just wondering if you know that you guys know how the html loaded with python may differ from the one in the source code, and what I can do about it.

Thanks in advance!

0

python html web-scraping

titusflex May 13, '16 at 6:50

source share

2 answers

, javascript. HTML-, , - javascript, . HTML- .

+3

Simon 13 '16 6:55

Hassan Mehmood · Accepted Answer · 2016-05-13T07:04:54+0000

Selenium . - , . Selenium firefox, chrome phantomjs.

-, JavaScript-. Crawlers/Scrappers - Selenium -.

, http://selenium-python.readthedocs.io/ Slenium . http://blog.hassanmehmood.com/creating-your-first-crawler-in-python/

import urllib
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

profile_link = 'http://hassanmehmood.com'


class TitleScrapper(object):

    def __init__(self):

        fp = webdriver.FirefoxProfile()
        fp.set_preference("browser.startup.homepage_override.mstone", "ignore") #Avoid startup screen
        fp.set_preference("startup.homepage_welcome_url.additional",  "about:blank")

        self.driver = webdriver.Firefox(firefox_profile=fp)
        self.driver.set_window_size(1120, 550)

    def scrape_profile(self):
        self.driver.get(profile_link)
        print self.driver.title
        self.driver.close()

    def scrape(self):
        self.scrape_profile()


if __name__ == '__main__':
    scraper = TitleScrapper()
    scraper.scrape()

How is pure HTML different from source code?

More articles: