Download prices using python

Question

Download prices using python

I have tried this before. I am completely at a loss for ideas.

On this page, this is a dialog box for qet quotes. http://www.schwab.com/public/schwab/non_navigable/marketing/email/get_quote.html ?

I used SPY, XLV, IBM, MSFT

The conclusion is given above with a table.

If you have an account, the real-time quote is through a cookie.

How to get a table in python using 2.6. Data as a List or Dictionary

+4

python

Merlin Nov 18 '10 at 20:02

source share

4 answers

Use something like Beautiful Soup to parse the HTML response from a website and load it into a dictionary. use the character as a key and a tuple of any data that interests you as a value. Flip all returned characters and add one record per character.

You can see examples of how to do this in Toby Segaran's “Programming Collective Intelligence”. All samples are in Python.

+5

duffymo Nov 18 '10 at 20:08

source share

Have you thought to use yahoo quotes api?
see http://developer.yahoo.com/yql/console/?q=show%20tables&env=store://datatables.org/alltableswithkeys#h=select%20 *% 20%% 20yahoo.finance.quotes% 20 where% 20symbol% 20% 3D% 20% 22YHOO% 22

You can dynamically generate a website request, for example:
http://query.yahooapis.com/v1/public/yql?q=select%20 *% 20%% 20yahoo.finance.quotes% 20 where% 20symbol% 20% 3D% 20% 22YHOO% 22 and diagnostics = true & env = store% 3A% 2F% 2Fdatatables.org% 2Falltableswithkeys

And just poll it with a standard HTTP GET request. The answer is in XML format.

+3

Alexandre Deschamps Nov 18 '10 at 20:18

source share

matplotlib has a module that receives historical quotes from Yahoo:

 >>> from matplotlib.finance import quotes_historical_yahoo >>> from datetime import date >>> from pprint import pprint >>> pprint(quotes_historical_yahoo('IBM', date(2010, 11, 12), date(2010, 11, 18))) [(734088.0, 144.59, 143.74000000000001, 145.77000000000001, 143.55000000000001, 4731500.0), (734091.0, 143.88999999999999, 143.63999999999999, 144.75, 143.27000000000001, 3827700.0), (734092.0, 142.93000000000001, 142.24000000000001, 143.38, 141.18000000000001, 6342100.0), (734093.0, 142.49000000000001, 141.94999999999999, 142.49000000000001, 141.38999999999999, 4785900.0)]

0

hughdbrown Nov 18 '10 at 22:17

source share

Hugh bothwell · Accepted Answer · 2010-11-20T15:48:54+0000

The first problem: the data is actually in the iframe in the frame; you need to watch https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=APC (where you substitute the corresponding character at the end of the URL).

The second problem: retrieving data from the page. I personally like lxml and xpath, but there are many packages that will do the job. I probably expect some code like

import urllib2 import lxml.html import re re_dollars = '\$?\s*(\d+\.\d{2})' def urlExtractData(url, defs): """ Get html from url, parse according to defs, return as dictionary defs is a list of tuples ("name", "xpath", "regex", fn ) name becomes the key in the returned dictionary xpath is used to extract a string from the page regex further processes the string (skipped if None) fn casts the string to the desired type (skipped if None) """ page = urllib2.urlopen(url) # can modify this to include your cookies tree = lxml.html.parse(page) res = {} for name,path,reg,fn in defs: txt = tree.xpath(path)[0] if reg != None: match = re.search(reg,txt) txt = match.group(1) if fn != None: txt = fn(txt) res[name] = txt return res def getStockData(code): url = 'https://www.schwab.wallst.com/public/research/stocks/summary.asp?user_id=schwabpublic&symbol=' + code defs = [ ("stock_name", '//span[@class="header1"]/text()', None, str), ("stock_symbol", '//span[@class="header2"]/text()', None, str), ("last_price", '//span[@class="neu"]/text()', re_dollars, float) # etc ] return urlExtractData(url, defs)

Upon call

 print repr(getStockData('MSFT'))

he returns

 {'stock_name': 'Microsoft Corp', 'last_price': 25.690000000000001, 'stock_symbol': 'MSFT:NASDAQ'}

The third problem: the markup on this page is presentation, not structural - which tells me that the code based on it is likely to be fragile, i.e. any change in page structure (or differences between pages) will require redistribution of XPaths.

Hope this helps!

Download prices using python

More articles: