Can I use Chrome Head Chrome / Chromium in Google Cloud Feature?

Is there a way to run Headless Chrome / Chromium in the Google Cloud feature? I understand that I can enable and run statically compiled binaries in GCF. Can I get a statically compiled version of Chrome that will work for this?

+13
google-chrome google-cloud-platform google-cloud-functions
source share
4 answers

The Node.js 8 runtime for Google Cloud Functions now includes all the necessary OS packages to run Chrome Headless.

Here is a sample HTTP function code that returns screenshots:

Main index.js file:

 const puppeteer = require('puppeteer'); exports.screenshot = async (req, res) => { const url = req.query.url; if (!url) { return res.send('Please provide URL as GET parameter, for example: <a href="?url=https://example.com">?url=https://example.com</a>'); } const browser = await puppeteer.launch({ args: ['--no-sandbox'] }); const page = await browser.newPage(); await page.goto(url); const imageBuffer = await page.screenshot(); await browser.close(); res.set('Content-Type', 'image/png'); res.send(imageBuffer); } 

and package.json

 { "name": "screenshot", "version": "0.0.1", "dependencies": { "puppeteer": "^1.6.2" } } 
+15
source share

I just deployed the GCF function with Chrome headless. A few tricks:

  1. you need to statically compile Chromium and NSS on Debian 8
  2. you need to fix environment variables to point to NSS before starting Chromium
  3. performance is much worse than what you get on AWS Lambda (3+ seconds)

For 1, you can find many instructions on the Internet.

For 2, the code I'm using is the following:

 static executablePath() { let bin = path.join(__dirname, '..', 'bin', 'chromium'); let nss = path.join(__dirname, '..', 'bin', 'nss', 'Linux3.16_x86_64_cc_glibc_PTH_64_OPT.OBJ'); if (process.env.PATH === undefined) { process.env.PATH = path.join(nss, 'bin'); } else if (process.env.PATH.indexOf(nss) === -1) { process.env.PATH = [path.join(nss, 'bin'), process.env.PATH].join(':'); } if (process.env.LD_LIBRARY_PATH === undefined) { process.env.LD_LIBRARY_PATH = path.join(nss, 'lib'); } else if (process.env.LD_LIBRARY_PATH.indexOf(nss) === -1) { process.env.LD_LIBRARY_PATH = [path.join(nss, 'lib'), process.env.LD_LIBRARY_PATH].join(':'); } if (fs.existsSync('/tmp/chromium') === true) { return '/tmp/chromium'; } return new Promise( (resolve, reject) => { try { fs.chmod(bin, '0755', () => { fs.symlinkSync(bin, '/tmp/chromium'); return resolve('/tmp/chromium'); }); } catch (error) { return reject(error); } } ); } 

When starting Chrome, you also need to use a few necessary arguments, namely:

 --disable-dev-shm-usage --disable-setuid-sandbox --no-first-run --no-sandbox --no-zygote --single-process 

Hope this helps.

+4
source share

As mentioned in the commentary, work is underway on a possible solution for launching a browser without a browser in a cloud function. A directly applicable discussion: β€œ mute chrome and aws lambda ” can be read on Google Groups.

+1
source share

Question c. did you have headless chrome or chrome in the cloud functions of Firebase ... no answer! since the node.js project will not have access to chrome / chromium executables and therefore will not work! (TRUST ME - I PASSED!).

The best solution is to use the Phantom npm package, which uses PhantomJS under the hood: https://www.npmjs.com/package/phantom

Documents and information can be found here:

http://amirraminfar.com/phantomjs-node/#/

or

https://github.com/amir20/phantomjs-node

The site on which I was trying to crawl implemented screen cleaning software, the trick is to wait for the page to load by searching for the expected line or matching the regular expression, i.e. I am doing a regular expression for a if you need any regular expression difficulties for you - contact https://AppLogics.uk/ - starting at Β£ 5 (GPB).

here is a typewriter snippet for calling http or https:

  const phantom = require('phantom'); const instance: any = await phantom.create(['--ignore-ssl-errors=yes', '--load-images=no']); const page: any = await instance.createPage(); const status = await page.open('https://somewebsite.co.uk/'); const content = await page.property('content'); 

again in JavaScript:

  const phantom = require('phantom'); const instance = yield phantom.create(['--ignore-ssl-errors=yes', '--load-images=no']); const page = yield instance.createPage(); const status = yield page.open('https://somewebsite.co.uk/'); const content = yield page.property('content'); 

This is an easy bit! if its static page is pretty much done, and you can parse the HTML as a cheerio npm package: https://github.com/cheeriojs/cheerio is an implementation of the jQuery core for servers!

However, if it is a page with dynamic loading, that is, lazy loading or even scrambling methods, you will need to wait for the page to refresh by looping and calling page.property('content') and starting a text search or regex to find out if your page has ended .

I created a generic asynchronous function that returns the contents of the page (as a string) on ​​successful launch and throws an exception on failure or timeout. Variables for the page, text (search string indicating success), error (string to indicate failure or null, so as not to check for errors), and timeout (number by itself) are used as parameters as parameters:

Typescript:

  async function waitForPageToLoadStr(page: any, text: string, error: string, timeout: number): Promise<string> { const maxTime = timeout ? (new Date()).getTime() + timeout : null; let html: string = ''; html = await page.property('content'); async function loop(): Promise<string>{ async function checkSuccess(): Promise <boolean> { html = await page.property('content'); if (!isNullOrUndefined(error) && html.includes(error)) { throw new Error('Error string found: ${ error }'); } if (maxTime && (new Date()).getTime() >= maxTime) { throw new Error('Timed out waiting for string: ${ text }'); } return html.includes(text) } if (await checkSuccess()){ return html; } else { return loop(); } } return await loop(); } 

JavaScript:

  function waitForPageToLoadStr(page, text, error, timeout) { return __awaiter(this, void 0, void 0, function* () { const maxTime = timeout ? (new Date()).getTime() + timeout : null; let html = ''; html = yield page.property('content'); function loop() { return __awaiter(this, void 0, void 0, function* () { function checkSuccess() { return __awaiter(this, void 0, void 0, function* () { html = yield page.property('content'); if (!isNullOrUndefined(error) && html.includes(error)) { throw new Error('Error string found: ${error}'); } if (maxTime && (new Date()).getTime() >= maxTime) { throw new Error('Timed out waiting for string: ${text}'); } return html.includes(text); }); } if (yield checkSuccess()) { return html; } else { return loop(); } }); } return yield loop(); }); } 

I personally used this function as follows:

Typescript:

  try { const phantom = require('phantom'); const instance: any = await phantom.create(['--ignore-ssl-errors=yes', '--load-images=no']); const page: any = await instance.createPage(); const status = await page.open('https://somewebsite.co.uk/'); await waitForPageToLoadStr(page, '<div>Welcome to somewebsite</div>', '<h1>Website under maintenance, try again later</h1>', 1000); } catch (error) { console.error(error); } 

JavaScript:

  try { const phantom = require('phantom'); const instance = yield phantom.create(['--ignore-ssl-errors=yes', '--load-images=no']); const page = yield instance.createPage(); yield page.open('https://vehicleenquiry.service.gov.uk/'); yield waitForPageToLoadStr(page, '<div>Welcome to somewebsite</div>', '<h1>Website under maintenance, try again later</h1>', 1000); } catch (error) { console.error(error); } 

Happy crawling!

-one
source share

All Articles