Get list items inside div tag using xpath

Question

Get list items inside div tag using xpath

I have such html

<div id="all-stories" class="book"> <ul> <li title="Book1" ><a href="book1_url">Book1</a></li> <li title="Book2" ><a href="book2_url">Book2</a></li> </ul> </div>

I want to get books and their corresponding url using xpath, but it seems my approach does not work. for simplicity I tried to extract all the elements under the "li" tags as follows

 lis = tree.xpath('//div[@id="all-stories"]/div/text()')

+7

python xpath lxml

Anurag sharma Jun 29 '13 at 13:49

source share

1 answer

unutbu · Accepted Answer · 2013-06-29T13:58:51+0000

 import lxml.html as LH content = '''\ <div id="all-stories" class="book"> <ul> <li title="Book1" ><a href="book1_url">Book1</a></li> <li title="Book2" ><a href="book2_url">Book2</a></li> </ul> </div> ''' root = LH.fromstring(content) for atag in root.xpath('//div[@id="all-stories"]//li/a'): print(atag.attrib['href'], atag.text_content())

gives

 ('book1_url', 'Book1') ('book2_url', 'Book2')

XPath //div[@id="all-stories"]/div doesn’t match anything because there is no child div inside the outer div tag.

XPath //div[@id="all-stories"]/li will not match either, because the div tag does not have a direct child li tage. However, //div[@id="all-stories"]//li matches li tags because // tells XPath to recursively search as deep as necessary to find li tags.

Now the content you are looking for is not in the li tag. It is inside the a tag. So instead use XPath '//div[@id="all-stories"]//li/a' to reach the a tags. The value of the href attribute can be accessed using atag.attrib['href'] , and the text using atag.text_content() .

Get list items inside div tag using xpath

More articles: