Download google docs public table in csv using python

I can download the CSV file from Google Docs using wget :

 wget --no-check-certificate --output-document=locations.csv 'https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv' 

But I can not download the same CSV from Python:

 import urllib2 request = urllib2.Request('https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv') request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1284.0 Safari/537.13') opener = urllib2.build_opener() data = opener.open(request).read() print(data) 

The result is a Google login page. What am I doing wrong?

+11
python google-spreadsheet google-sheets
source share
4 answers

Just use queries , this is much better than using urllib:

 import requests response = requests.get('https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv') assert response.status_code == 200, 'Wrong status code' print(response.content) 

You can install it with

 pip install requests 
+18
source share

You do not store cookies.

First of all, let me say that I fully approve the recommendation to use the most excellent requests library .

However, if you need to do this in vanilla Python 2, the problem is that Google is pushing you through HTTP 302 redirects and expects you to remember the cookies that it sets with each response. When it discovers that you do not store cookies, it redirects you to the login page.

By default, urllib2.urlopen (or the opener returned from build_opener ) will follow 302 redirects, but it will not store HTTP cookies. You must teach your newbie how to do this. For example:

 >>> from cookielib import CookieJar >>> from urllib2 import build_opener, HTTPCookieProcessor >>> opener = build_opener(HTTPCookieProcessor(CookieJar())) >>> resp = opener.open('https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv') >>> data = resp.read() 

Again, use requests if at all possible, but if this is not possible, the standard library can do the job.

+11
source share

The requests library is an excellent and gold standard for Python HTTP requests from Python, however this download style, although not outdated, is unlikely to be preserved, in particular, referring to the download style. In fact, the downloadUrl field in Google Drive API v2 is already out of date . The currently accepted way to export Google Sheets in CSV format is to use the (current) Google Drive API .

So why the Drive API? Shouldn't this be something for API Sheets instead? Well, the Tables API is for tables- oriented functionality, that is, formatting data, resizing a column, creating charts, checking a cell, etc. While the Drive API for a file is oriented functionality, then there is import / export.

Below is the complete cmd-line solution . (If you are not using Python, you can use it as pseudo-code and select any language supported by the Google API Client Libraries .) Snippet, suppose the very last sheet is called inventory (old files with this name are ignored) and DRIVE is the final API service point:

 FILENAME = 'inventory' SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet' DST_MIMETYPE = 'text/csv' # query for latest file named FILENAME files = DRIVE.files().list( q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE), orderBy='modifiedTime desc,name').execute().get('files', []) # if found, export Sheets file as CSV if files: fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0] print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='') data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute() # if non-empty file if data: with open(fn, 'wb') as f: f.write(data) print('DONE') 

If your sheet is large, you may need to export it to pieces - see this page on how to do this. If you are new to the Google APIs at all, I have a (somewhat outdated, but) convenient intro video for you. (After that, there are 2 videos that may be useful.)

+1
source share

I would use queries

 import requests r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0ArM5yzzCw9IZdEdLWlpHT1FCcUpYQ2RjWmZYWmNwbXc&output=csv') data = r.content 
0
source share

All Articles