Urllib2.HTTPError: HTTP Error 403: Forbidden

Question

Urllib2.HTTPError: HTTP Error 403: Forbidden

I am trying to automate loading historical stock data using python. The url I am trying to open responds to a CSV file, but I cannot open it with urllib2. I tried to change the user agent, as indicated in several questions earlier, I even tried to accept answer files, with no luck. You can help.

Note. The same method works for yahoo Finance.

The code:

import urllib2,cookielib site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true" hdr = {'User-Agent':'Mozilla/5.0'} req = urllib2.Request(site,headers=hdr) page = urllib2.urlopen(req)

Mistake

File "C: \ Python27 \ lib \ urllib2.py", line 527, in http_error_default raise HTTPError (req.get_full_url (), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden

thanks for the help

+55

python http urllib

kumar Nov 09

source share

3 answers

This will work in Python 3

 import urllib.request user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7' url = "http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers" headers={'User-Agent':user_agent,} request=urllib.request.Request(url,None,headers) #The assembled request response = urllib.request.urlopen(request) data = response.read() # The data u need

+23

Eish Apr 24 '13 at 9:09

source share

The NSE website has changed, and the old scripts are semi-optimal for the current website. This snippet can collect daily safety data. Details include symbol, type of security, previous close, opening price, high price, low price, average price, sales quantity, turnover, number of transactions, delivered quantities and percentage delivered and traded. They are conveniently presented in the form of a list of dictionary forms.

Python version 3.X with queries and BeautifulSoup

 from requests import get from csv import DictReader from bs4 import BeautifulSoup as Soup from datetime import date from io import StringIO SECURITY_NAME="3MINDIA" # Change this to get quote for another stock START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY END_DATE= date(2017, 9, 14) # End date of stock quote data DD-MM-YYYY BASE_URL = "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE" def getquote(symbol, start, end): start = start.strftime("%-d-%-m-%Y") end = end.strftime("%-d-%-m-%Y") hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Referer': 'https://cssspritegenerator.com', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding': 'none', 'Accept-Language': 'en-US,en;q=0.8', 'Connection': 'keep-alive'} url = BASE_URL.format(security=symbol, start_date=start, end_date=end) d = get(url, headers=hdr) soup = Soup(d.content, 'html.parser') payload = soup.find('div', {'id': 'csvContentDiv'}).text.replace(':', '\n') csv = DictReader(StringIO(payload)) for row in csv: print({k:v.strip() for k, v in row.items()}) if __name__ == '__main__': getquote(SECURITY_NAME, START_DATE, END_DATE)

In addition, it is a relatively modular and ready-to-use fragment.

+1

djinn Sep 14 '17 at 8:00

source share

andrean · Accepted Answer · 2012-11-09 07:19

By adding a few more headers, I was able to get the data:

 import urllib2,cookielib site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true" hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding': 'none', 'Accept-Language': 'en-US,en;q=0.8', 'Connection': 'keep-alive'} req = urllib2.Request(site, headers=hdr) try: page = urllib2.urlopen(req) except urllib2.HTTPError, e: print e.fp.read() content = page.read() print content

Actually, it only works with this additional header:

 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

Urllib2.HTTPError: HTTP Error 403: Forbidden

Python version 3.X with queries and BeautifulSoup

More articles: