I am using python3.5 and window10.
When I crawl some pages, I usually used url changes using urlopen and the iteration of 'for'. as shown below.
from bs4 import BeautifulSoup
import urllib
f = open('Slave.txt','w')
for i in range(1,42):
html = urllib.urlopen('http://xroads.virginia.edu/~hyper/JACOBS/hjch'+str(i)+'.htm')
soup = BeautifulSoup(html,"lxml")
text = soup.getText()
f.write(text.encode("utf-8"))
f.close()
But I have problems, because there are no changes in the URL, although I clicked on the following pages and the web content was changed, for example, the image. there are no changes to the url and pattern.
enter image description here
There is no signal in the url that I can catch on websites.
http://eungdapso.seoul.go.kr/Shr/Shr01/Shr01_lis.jsp
-
, , .
, ,
Beautifulsoup. , commonPagingPost .
<span class="number"><a href="javascript:;"
class="on">1</a>
<a href="javascript:commonPagingPost('2','10','Shr01_lis.jsp');">2</a>
<a href="javascript:commonPagingPost('3','10','Shr01_lis.jsp');">3</a>
<a href="javascript:commonPagingPost('4','10','Shr01_lis.jsp');">4</a>
<a href="javascript:commonPagingPost('5','10','Shr01_lis.jsp');">5</a></span>
, beutifulSoup4?
fisrt, urlopen.