Take a screenshot of the full page using Selenium Python with a chrome driver

After trying out various approaches ... I came across this page to take a full screen shot with chromedriver, selenium and python.

The original code is here . (and I copy the code in this post below)

It uses PIL and works great! However, there is one problem ... it captures fixed headers and repeats for the entire page, and also skips some parts of the page during a page change. example url to take a screenshot:

http://www.w3schools.com/js/default.asp

How to avoid duplicate headers with this code ... Or is there a better option that uses only Python ... (I donโ€™t know Java and donโ€™t want to use Java).

See a screenshot of the current result and code example below.

full page screenshot with repeated headers

test.py

""" This script uses a simplified version of the one here: https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/ It contains the *crucial* correction added in the comments by Jason Coutu. """ import sys from selenium import webdriver import unittest import util class Test(unittest.TestCase): """ Demonstration: Get Chrome to generate fullscreen screenshot """ def setUp(self): self.driver = webdriver.Chrome() def tearDown(self): self.driver.quit() def test_fullpage_screenshot(self): ''' Generate document-height screenshot ''' #url = "http://effbot.org/imagingbook/introduction.htm" url = "http://www.w3schools.com/js/default.asp" self.driver.get(url) util.fullpage_screenshot(self.driver, "test.png") if __name__ == "__main__": unittest.main(argv=[sys.argv[0]]) 

util.py

 import os import time from PIL import Image def fullpage_screenshot(driver, file): print("Starting chrome full page screenshot workaround ...") total_width = driver.execute_script("return document.body.offsetWidth") total_height = driver.execute_script("return document.body.parentNode.scrollHeight") viewport_width = driver.execute_script("return document.body.clientWidth") viewport_height = driver.execute_script("return window.innerHeight") print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height)) rectangles = [] i = 0 while i < total_height: ii = 0 top_height = i + viewport_height if top_height > total_height: top_height = total_height while ii < total_width: top_width = ii + viewport_width if top_width > total_width: top_width = total_width print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height)) rectangles.append((ii, i, top_width,top_height)) ii = ii + viewport_width i = i + viewport_height stitched_image = Image.new('RGB', (total_width, total_height)) previous = None part = 0 for rectangle in rectangles: if not previous is None: driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1])) print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1])) time.sleep(0.2) file_name = "part_{0}.png".format(part) print("Capturing {0} ...".format(file_name)) driver.get_screenshot_as_file(file_name) screenshot = Image.open(file_name) if rectangle[1] + viewport_height > total_height: offset = (rectangle[0], total_height - viewport_height) else: offset = (rectangle[0], rectangle[1]) print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1])) stitched_image.paste(screenshot, offset) del screenshot os.remove(file_name) part = part + 1 previous = rectangle stitched_image.save(file) print("Finishing chrome full page screenshot workaround...") return True 
+25
python selenium selenium-chromedriver webpage-screenshot
source share
17 answers

You can achieve this by changing the CSS header before the screenshot:

 topnav = driver.find_element_by_id("topnav") driver.execute_script("arguments[0].setAttribute('style', 'position: absolute; top: 0px;')", topnav) 

EDIT . Place this line after scrolling the window:

 driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');") 

So, in util.py it will be:

 driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1])) driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');") 

If the site uses the header tag, you can do this with find_element_by_tag_name("header")

+6
source share
 element = driver.find_element_by_tag_name('body') element_png = element.screenshot_as_png with open("test2.png", "wb") as file: file.write(element_png) 

This works for me. Saves the entire page as a screenshot. For more information, you can read the API documentation: http://selenium-python.readthedocs.io/api.html

+15
source share

Screenshots are limited to the viewport, but you can get around this by capturing the body element, since the web driver will capture the entire element, even if it is larger than the viewport. This eliminates the need for scrolling and stitching images, however you can see problems with the position of the footer (as in the screenshot below).

Tested on Windows 8 and Mac High Sierra with the Chrome driver.

 from selenium import webdriver url = 'https://stackoverflow.com/' path = '/path/to/save/in/scrape.png' driver = webdriver.Chrome() driver.get(url) el = driver.find_element_by_tag_name('body') el.screenshot(path) driver.quit() 

Returns: (full size: https://i.stack.imgur.com/ppDiI.png )

SO_scrape

+12
source share

This answer is improved over previous answers by am05mhz and Javed Karim .

It is assumed that headless mode and that the window size option was not initially set. Before calling this function, make sure that the page is loaded fully or sufficiently.

He is trying to set the width and height as what is needed. A screenshot of an entire page can sometimes include an unnecessary vertical scroll bar. One way to avoid the scrollbar altogether is to take a screenshot of the body element. After saving a screenshot, it returns the size to its original size, otherwise the size of the next screenshot may not be set correctly.

Ultimately, this method may still not work quite well for some examples.

 def save_screenshot(driver: webdriver.Chrome, path: str = '/tmp/screenshot.png') -> None: # Ref: https://stackoverflow.com/a/52572919/ original_size = driver.get_window_size() required_width = driver.execute_script('return document.body.parentNode.scrollWidth') required_height = driver.execute_script('return document.body.parentNode.scrollHeight') driver.set_window_size(required_width, required_height) # driver.save_screenshot(path) # has scrollbar driver.find_element_by_tag_name('body').screenshot(path) # avoids scrollbar driver.set_window_size(original_size['width'], original_size['height']) 

If you are using Python older than 3.6, remove the type annotations from the function definition.

+11
source share

Learning @Moshisho's approach.

My full standalone script work is ... (0.2 sleep added after each scroll and position)

 import sys from selenium import webdriver import util import os import time from PIL import Image def fullpage_screenshot(driver, file): print("Starting chrome full page screenshot workaround ...") total_width = driver.execute_script("return document.body.offsetWidth") total_height = driver.execute_script("return document.body.parentNode.scrollHeight") viewport_width = driver.execute_script("return document.body.clientWidth") viewport_height = driver.execute_script("return window.innerHeight") print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height)) rectangles = [] i = 0 while i < total_height: ii = 0 top_height = i + viewport_height if top_height > total_height: top_height = total_height while ii < total_width: top_width = ii + viewport_width if top_width > total_width: top_width = total_width print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height)) rectangles.append((ii, i, top_width,top_height)) ii = ii + viewport_width i = i + viewport_height stitched_image = Image.new('RGB', (total_width, total_height)) previous = None part = 0 for rectangle in rectangles: if not previous is None: driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1])) time.sleep(0.2) driver.execute_script("document.getElementById('topnav').setAttribute('style', 'position: absolute; top: 0px;');") time.sleep(0.2) print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1])) time.sleep(0.2) file_name = "part_{0}.png".format(part) print("Capturing {0} ...".format(file_name)) driver.get_screenshot_as_file(file_name) screenshot = Image.open(file_name) if rectangle[1] + viewport_height > total_height: offset = (rectangle[0], total_height - viewport_height) else: offset = (rectangle[0], rectangle[1]) print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1])) stitched_image.paste(screenshot, offset) del screenshot os.remove(file_name) part = part + 1 previous = rectangle stitched_image.save(file) print("Finishing chrome full page screenshot workaround...") return True driver = webdriver.Chrome() ''' Generate document-height screenshot ''' url = "http://effbot.org/imagingbook/introduction.htm" url = "http://www.w3schools.com/js/default.asp" driver.get(url) fullpage_screenshot(driver, "test1236.png") 
+7
source share

Not sure if people still have this problem. I made a small hack that works pretty well and works well with dynamic zones. Hope help

 # 1. get dimensions browser = webdriver.Chrome(chrome_options=options) browser.set_window_size(default_width, default_height) browser.get(url) time.sleep(sometime) total_height = browser.execute_script("return document.body.parentNode.scrollHeight") browser.quit() # 2. get screenshot browser = webdriver.Chrome(chrome_options=options) browser.set_window_size(default_width, total_height) browser.get(url) browser.save_screenshot(screenshot_path) 
+7
source share

I changed the code for Python 3.6, maybe it will be useful for someone:

 from selenium import webdriver from sys import stdout from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.desired_capabilities import DesiredCapabilities import unittest #from Login_Page import Login_Page from selenium.webdriver.firefox.firefox_binary import FirefoxBinary from io import BytesIO from PIL import Image def testdenovoUIavailable(self): binary = FirefoxBinary("C:\\Mozilla Firefox\\firefox.exe") self.driver = webdriver.Firefox(firefox_binary=binary) verbose = 0 #open page self.driver.get("http://yandex.ru") #hide fixed header #js_hide_header=' var x = document.getElementsByClassName("topnavbar-wrapper ng-scope")[0];x[\'style\'] = \'display:none\';' #self.driver.execute_script(js_hide_header) #get total height of page js = 'return Math.max( document.body.scrollHeight, document.body.offsetHeight, document.documentElement.clientHeight, document.documentElement.scrollHeight, document.documentElement.offsetHeight);' scrollheight = self.driver.execute_script(js) if verbose > 0: print(scrollheight) slices = [] offset = 0 offset_arr=[] #separate full screen in parts and make printscreens while offset < scrollheight: if verbose > 0: print(offset) #scroll to size of page if (scrollheight-offset)<offset: #if part of screen is the last one, we need to scroll just on rest of page self.driver.execute_script("window.scrollTo(0, %s);" % (scrollheight-offset)) offset_arr.append(scrollheight-offset) else: self.driver.execute_script("window.scrollTo(0, %s);" % offset) offset_arr.append(offset) #create image (in Python 3.6 use BytesIO) img = Image.open(BytesIO(self.driver.get_screenshot_as_png())) offset += img.size[1] #append new printscreen to array slices.append(img) if verbose > 0: self.driver.get_screenshot_as_file('screen_%s.jpg' % (offset)) print(scrollheight) #create image with screenshot = Image.new('RGB', (slices[0].size[0], scrollheight)) offset = 0 offset2= 0 #now glue all images together for img in slices: screenshot.paste(img, (0, offset_arr[offset2])) offset += img.size[1] offset2+= 1 screenshot.save('test.png') 
+6
source share

Why not just get the width and height of the page, and then resize the driver? So it will be something like this

 total_width = driver.execute_script("return document.body.offsetWidth") total_height = driver.execute_script("return document.body.scrollHeight") driver.set_window_size(total_width, total_height) driver.save_screenshot("SomeName.png") 

This will take a screenshot of your entire page without having to combine the different parts.

+4
source share

The key is to enable headless mode! There is no need to flash and no need to load the page twice.

Full working code:

 URL = 'http://www.w3schools.com/js/default.asp' options = webdriver.ChromeOptions() options.headless = True driver = webdriver.Chrome(options=options) driver.get(URL) S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X) driver.set_window_size(S('Width'),S('Height')) # May need manual adjustment driver.find_element_by_tag_name('body').screenshot('web_screenshot.png') driver.quit() 

This is pretty much the same code that @Acumenus posted with minor improvements.

Summary of my findings

I decided to publish this anyway, because I could not find an explanation of what happens when the headless mode is off (the browser is displayed) to take screenshots. As I tested (with Chrome WebDriver), if headless mode is enabled, the screenshot is saved as desired. However, if headless mode is disabled, the saved screen shot has approximately the correct width and height, but the result varies depending on the particular case. Usually the top of the page that is visible on the screen is saved, but the rest of the image is just white. There was also a case with an attempt to save this thread to Qaru using the above link; even the upper part was not preserved, which was interestingly now transparent, while the rest was still white. The last case that I noticed was only once with the specified W3Schools link; where there are no white parts, but the top of the page repeats to the end, including the title.

I hope that this will help many of those who, for some reason, do not get the expected result, since I did not see anyone explicitly explain the requirement of headless mode with such a simple approach. Only when I myself found a solution to this problem, I found a message by @ vc2279, which says that the headless browser window can be set to any size (which, apparently, is also true for the opposite case), although the solution in my post improves that that it does not require reopening the browser / driver or reloading the page.

Additional offers

If this doesnโ€™t work for some pages, I suggest trying to add time.sleep(seconds) before getting the page size. Another case is if the page requires scrolling to the end in order to load additional content, which can be solved using the scheight method from this post :

 scheight = .1 while scheight < 9.9: driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight) scheight += .01 

Also note that for some pages, content may not be present in any of the top-level HTML tags, such as <html> or <body> , for example, YouTube uses the <ytd-app> . As a last note, I found one page that "returned" a screenshot with a horizontal scrollbar, the window size required manual adjustment, i.e. the image width needed to be increased by 18 pixels, for example: S('Width')+18 .

+2
source share
 element=driver.find_element_by_tag_name('body') element_png = element.screenshot_as_png with open("test2.png", "wb") as file: file.write(element_png) 

An error has occurred in the code suggested earlier in line 2. Here is the corrected one. Being noob here, I still cannot edit my own post.

Sometimes baove does not give the best results. Thus, you can use another method to obtain the height of all elements and sum them to set the grip height, as shown below:

 element=driver.find_elements_by_xpath("/html/child::*/child::*") eheight=set() for e in element: eheight.add(round(e.size["height"])) print (eheight) total_height = sum(eheight) driver.execute_script("document.getElementsByTagName('html')[0].setAttribute('style', 'height:"+str(total_height)+"px')") element=driver.find_element_by_tag_name('body') element_png = element.screenshot_as_png with open(fname, "wb") as file: file.write(element_png) 

By the way, he works for FF.

0
source share

Modify @ihightower and @ A.Minachev code a bit and make it work on Mac Retina:

 import time from PIL import Image from io import BytesIO def fullpage_screenshot(driver, file, scroll_delay=0.3): device_pixel_ratio = driver.execute_script('return window.devicePixelRatio') total_height = driver.execute_script('return document.body.parentNode.scrollHeight') viewport_height = driver.execute_script('return window.innerHeight') total_width = driver.execute_script('return document.body.offsetWidth') viewport_width = driver.execute_script("return document.body.clientWidth") # this implementation assume (viewport_width == total_width) assert(viewport_width == total_width) # scroll the page, take screenshots and save screenshots to slices offset = 0 # height slices = {} while offset < total_height: if offset + viewport_height > total_height: offset = total_height - viewport_height driver.execute_script('window.scrollTo({0}, {1})'.format(0, offset)) time.sleep(scroll_delay) img = Image.open(BytesIO(driver.get_screenshot_as_png())) slices[offset] = img offset = offset + viewport_height # combine image slices stitched_image = Image.new('RGB', (total_width * device_pixel_ratio, total_height * device_pixel_ratio)) for offset, image in slices.items(): stitched_image.paste(image, (0, offset * device_pixel_ratio)) stitched_image.save(file) fullpage_screenshot(driver, 'test.png') 
0
source share

I modified jeremie's answer so that it only gets the URL once.

 browser = webdriver.Chrome(chrome_options=options) browser.set_window_size(default_width, default_height) browser.get(url) height = browser.execute_script("return document.body.parentNode.scrollHeight") # 2. get screenshot browser.set_window_size(default_width, height) browser.save_screenshot(screenshot_path) browser.quit() 
0
source share

You can use splinter
Splinter is an abstraction layer on top of existing browser automation tools like Selenium
In the new version 0.10.0 , a new browser.screenshot(..., full=True) function has appeared.
full=True option will take a full screenshot for you.

0
source share

Got it !!! works like a charm

For NodeJS, but the concept is the same:

 await driver.executeScript(' document.documentElement.style.display = "table"; document.documentElement.style.width = "100%"; document.body.style.display = "table-row"; '); await driver.findElement(By.css('body')).takeScreenshot(); 
0
source share

easy python but slow

 import os from selenium import webdriver from PIL import Image def full_screenshot(driver: webdriver): driver.execute_script(f"window.scrollTo({0}, {0})") total_width = driver.execute_script("return document.body.offsetWidth") total_height = driver.execute_script("return document.body.parentNode.scrollHeight") viewport_width = driver.execute_script("return document.body.clientWidth") viewport_height = driver.execute_script("return window.innerHeight") rectangles = [] i = 0 while i < total_height: ii = 0 top_height = i + viewport_height if top_height > total_height: top_height = total_height while ii < total_width: top_width = ii + viewport_width if top_width > total_width: top_width = total_width rectangles.append((ii, i, top_width, top_height)) ii = ii + viewport_width i = i + viewport_height stitched_image = Image.new('RGB', (total_width, total_height)) previous = None part = 0 for rectangle in rectangles: if not previous is None: driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1])) file_name = "part_{0}.png".format(part) driver.get_screenshot_as_file(file_name) screenshot = Image.open(file_name) if rectangle[1] + viewport_height > total_height: offset = (rectangle[0], total_height - viewport_height) else: offset = (rectangle[0], rectangle[1]) stitched_image.paste(screenshot, offset) del screenshot os.remove(file_name) part = part + 1 previous = rectangle return stitched_image 
0
source share

I changed the answer given by @ihightower, instead of saving a screenshot in this function, returning the total height and total width of the web page, and then setting the window size to the total height and total width.

 from PIL import Image from io import BytesIO from selenium import webdriver from selenium.webdriver.chrome.options import Options def open_url(url): options = Options() options.headless = True driver = webdriver.Chrome(chrome_options=options) driver.maximize_window() driver.get(url) save_screenshot(driver, 'screen.png') def save_screenshot(driver, file_name): height, width = scroll_down(driver) driver.set_window_size(width, height) img_binary = driver.get_screenshot_as_png() img = Image.open(BytesIO(img_binary)) img.save(file_name) # print(file_name) print(" screenshot saved ") def scroll_down(driver): total_width = driver.execute_script("return document.body.offsetWidth") total_height = driver.execute_script("return document.body.parentNode.scrollHeight") viewport_width = driver.execute_script("return document.body.clientWidth") viewport_height = driver.execute_script("return window.innerHeight") rectangles = [] i = 0 while i < total_height: ii = 0 top_height = i + viewport_height if top_height > total_height: top_height = total_height while ii < total_width: top_width = ii + viewport_width if top_width > total_width: top_width = total_width rectangles.append((ii, i, top_width, top_height)) ii = ii + viewport_width i = i + viewport_height previous = None part = 0 for rectangle in rectangles: if not previous is None: driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1])) time.sleep(0.5) # time.sleep(0.2) if rectangle[1] + viewport_height > total_height: offset = (rectangle[0], total_height - viewport_height) else: offset = (rectangle[0], rectangle[1]) previous = rectangle return (total_height, total_width) open_url("https://www.medium.com") 
0
source share

How it works: Set maximum browser height ...

 #coding=utf-8 import time from selenium import webdriver from selenium.webdriver.chrome.options import Options def test_fullpage_screenshot(self): chrome_options = Options() chrome_options.add_argument('--headless') chrome_options.add_argument('--start-maximized') driver = webdriver.Chrome(chrome_options=chrome_options) driver.get("yoururlxxx") time.sleep(2) #the element with longest height on page ele=driver.find_element("xpath", '//div[@class="react-grid-layout layout"]') total_height = ele.size["height"]+1000 driver.set_window_size(1920, total_height) #the trick time.sleep(2) driver.save_screenshot("screenshot1.png") driver.quit() if __name__ == "__main__": test_fullpage_screenshot() 
0
source share

All Articles