How to scan pagination pages? No URL change when I click the next page

I am using python3.5 and window10.

When I crawl some pages, I usually used url changes using urlopen and the iteration of 'for'. as shown below.

from bs4 import BeautifulSoup

import urllib
f = open('Slave.txt','w')

for i in range(1,42):
 html = urllib.urlopen('http://xroads.virginia.edu/~hyper/JACOBS/hjch'+str(i)+'.htm')
 soup = BeautifulSoup(html,"lxml")
 text = soup.getText()
 f.write(text.encode("utf-8"))

f.close()

But I have problems, because there are no changes in the URL, although I clicked on the following pages and the web content was changed, for example, the image. there are no changes to the url and pattern. enter image description here

There is no signal in the url that I can catch on websites.

http://eungdapso.seoul.go.kr/Shr/Shr01/Shr01_lis.jsp

- , , . , , Beautifulsoup. , commonPagingPost .

<span class="number"><a href="javascript:;" 
class="on">1</a>&nbsp;&nbsp;
<a href="javascript:commonPagingPost('2','10','Shr01_lis.jsp');">2</a>&nbsp;&nbsp;
<a href="javascript:commonPagingPost('3','10','Shr01_lis.jsp');">3</a>&nbsp;&nbsp;
<a href="javascript:commonPagingPost('4','10','Shr01_lis.jsp');">4</a>&nbsp;&nbsp;
<a href="javascript:commonPagingPost('5','10','Shr01_lis.jsp');">5</a></span>

, beutifulSoup4? fisrt, urlopen.

+4
2

beautifulsoup, ajax. - selenium, ghost.py - javascript.

, , .

0

commonPagingPost JavaScript:

function commonPagingPost (Page, Block, Action) {
                var Frm = document.mainForm;
                Frm.RCEPT_NO.value = "";
                Frm.page.value = Page;
                Frm.action = Action;
                Frm.submit ();
}

, , , "mainForm" . mainForm?

<form name="mainForm" method="post" action="">
                <input type="hidden" name="RCEPT_NO" value="">
                <input type="hidden" name="search_flag" value="N">
                <input type="hidden" name="page" value="1">
</form>

, , "Shr01_lis.jsp", , . Python? !

import requests

r = requests.post(
    "http://eungdapso.seoul.go.kr/Shr/Shr01/Shr01_lis.jsp",
    data={
        "RCEPT_NO": "",
        "search_flag": "N",
        "page": "5"
    })

soup = BeautifulSoup(r.text, 'lxml')

requests urllib, POST.

0

All Articles