Ruby script to download google personal docs

I would like to write a script in Ruby (using gdata gem, gemat client or just direct Net :: HTTP) to authenticate with my google documents using gmail-userid / password, and then download a list of private documents and documents.

The GData docs guide makes it clear how to get public documents, but it's not clear how I can authenticate in my script to access private documents. the authentication methods that they define all seem to require human intervention, either using Capcha or some form of OAuth / OpenID redirect.

Is there any way to access my private documents with just the userid / password combination? Or is it possible that along with the API key? If so, can someone show me how to do this?

+7
source share
4 answers

So, sometimes refusing, moving on to something else, and returning with fresh thinking, work miracles. I started looking at it again this morning and after a couple of hours I started working.

I dropped OAuth because the Ruby OAuth stone seems to be centered around web applications. I started peeking into Google Data on Rails and there was no problem with getting authentication using ClientLogin, and as far as I can tell, you are not receiving CAPTCHA requests unless you enter the wrong credentials ... or at least I still did not see.

Here is a simple code snippet for exporting a spreadsheet file:

require 'gdata/client' require 'gdata/http' require 'gdata/auth' client = GData::Client::Spreadsheets.new client.clientlogin('username', 'password') test = client.get("http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key="resource_ID"&fmcmd&exportFormat=xls") file = File.new("spreadsheet.xls", "wb") file.write test.body file.close 
+5
source

Today I started this exact same project and ran into the same problem. I managed to get around using OAuth or OpenID, but still working on file upload ... it seems like this should be the easy part. Anyway, here is what I did:

I use Mechanize Stone to clean the docs.google.com page for username and password forms. I pass my credentials through Mechanize and now have access to my Google docs.

At this point, I find that I can use the download URL mentioned in this Google documentation:

http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#DownloadingDocs

The url is as follows (I work with spreadsheets):

"http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=" resource_id_goes_here "& exportFormat = XLS"

For tinkering / testing, I just take the resource identifier of my table from the address bar of my web browser (when I have a spreadsheet open in my browser) and connecting it to the above URL in another tab of my browser. This works because when I submit the url, the spreadsheet loads as an .xls file. Please note that all this is used in my web browser.

I was not able to successfully start the download through my Ruby script. This URL is not a direct link to the file, so I'm not quite sure how to commit the file data correctly. The script is successful, but if I save the output of the Ruby get method (which uses this URL as an argument) in the object, it appears to be part of the javascript redirect. I will probably miss something obvious, but where I am. I blame myself for being stuck in the watch when I read about OAuth and OpenID ... it wasnโ€™t very fun.

Hope this is helpful. Here is another interesting Ruby stone that I stumbled upon my research on authentication:

OAuth Ruby Gem: http://oauth.rubyforge.org/

0
source

Of course, here is the basic version of what I'm doing:

 require 'mechanize' agent = Mechanize.new page = agent.get "https://docs.google.com" form = page.forms.first form.Email = "your_username" form.Passwd = "your_password" page = agent.submit form test = agent.get "google_download_url_goes_here" puts test.body 

If you look at test , you will see Java redirect material instead of the xls file.

I did not work on it after a couple of days, but I have a slight feeling that I am getting a redirect because the script is not "correctly" authenticated. The mechanism should handle cookies and redirect, so I think it just should work, but it doesnโ€™t.

UPDATE:

The export URL is a little further on the same page in the documentation that you referenced in your comment. The URL for exporting the spreadsheet is as follows:

http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key= "document_resource_id_goes_here" & exportFormat = xls

You should be able to connect this in a browser and download the file (if you are logged in, of course). The document resource identifier is the only unique key for any document that you work with, it can be manually inserted into the URL for testing in the browser.

However, I am sure that none of these API URLs will work in the script unless it handles authentication correctly as requested by Google. I'm not quite sure what I'm looking for, but using Wireshark to sniff out packages, I see some errors when using the script that I don't get when using my browser. These errors occur when the server and script exchange information about a certificate. Anyway, I look again at the pearl of OAuth and think that I am beginning to understand this better.

If you go here:

http://googlecodesamples.com/oauth_playground/

You can play with OAuth stuff, this is crazy how it works. You are requesting a request token with a set of parameters that must be "fair." It sends a request token, which is then used to link to the login page where you enter your Google credentials (as with manual work with Google docs). As soon as your credentials are verified, he will ask you to provide permission for the request token. The request token is updated to the access token, and then returned back to your script, and you can start working with the rest of the API by specifying this access token ... it seems unnecessary, but I'm not a security expert.

Here is what I hope to do:

  • Find out how to use the OAuth Ruby gem to request and send tokens to Google.

  • Use the Engine to clear the Google login page and enter the credentials as soon as I can send him the request token he wants

  • Use the Engine to click the "Grant Access" button after submitting my credentials

  • Then I hope you find that I can actually use the rest of the API to work with files

(Grrr! Learning how to format text correctly on this site is just as difficult !! :))

0
source

The code in the first answer didnโ€™t quite work for me. Here is what I used.

 require 'gdata/client' require 'gdata/http' require 'gdata/auth' KEY = 'YOUR_DOCUMENT_KEY' URL = "https://docs.google.com/feeds/download/spreadsheets" client = GData::Client::Spreadsheets.new client.clientlogin('REPLACE_WITH_LOGIN', 'REPLACE_WITH_PASSWORD') #Change the csw at the end to match your required format test = client.get("#{URL}/Export?key=#{KEY}&fmcmd&exportFormat=csv") puts test.body 
0
source

All Articles