I am currently trying to practice with BeautifulSoup's requests and modules in Python 3.6, and am running into a problem that I cannot find in other questions and answers about.
It seems that at some point the Beuatiful Soup page stops recognizing tags and identifiers. I am trying to pull Play-by-play data from a page as follows:
http://www.pro-football-reference.com/boxscores/201609080den.htm
import requests, bs4
source_url = 'http://www.pro-football-reference.com/boxscores/201609080den.htm'
res = requests.get(source_url)
if '404' in res.url:
raise Exception('No data found for this link: '+source_url)
soup = bs4.BeautifulSoup(res.text,'html.parser')
all_pbp = soup.findAll('div', {'id' : 'all_pbp'})
print(len(all_pbp))
table = soup.findAll('table', {'id' : 'pbp'})
print(len(table))
Using the inspector in Chrome, I see that the table definitely exists. I also tried using it in div and tr in the later half of HTML and it doesn't seem to work. I tried the standard "html.parser" as well as lxml and html5lib, but nothing works.
- , - HTML , BeautifulSoup ? (hockey-reference.com, basketball-reference.com), .
- HTML, - /, ?
,
BF