Get list items inside div tag using xpath
I have such html
<div id="all-stories" class="book"> <ul> <li title="Book1" ><a href="book1_url">Book1</a></li> <li title="Book2" ><a href="book2_url">Book2</a></li> </ul> </div> I want to get books and their corresponding url using xpath, but it seems my approach does not work. for simplicity I tried to extract all the elements under the "li" tags as follows
lis = tree.xpath('//div[@id="all-stories"]/div/text()') import lxml.html as LH content = '''\ <div id="all-stories" class="book"> <ul> <li title="Book1" ><a href="book1_url">Book1</a></li> <li title="Book2" ><a href="book2_url">Book2</a></li> </ul> </div> ''' root = LH.fromstring(content) for atag in root.xpath('//div[@id="all-stories"]//li/a'): print(atag.attrib['href'], atag.text_content()) gives
('book1_url', 'Book1') ('book2_url', 'Book2') XPath //div[@id="all-stories"]/div doesnβt match anything because there is no child div inside the outer div tag.
XPath //div[@id="all-stories"]/li will not match either, because the div tag does not have a direct child li tage. However, //div[@id="all-stories"]//li matches li tags because // tells XPath to recursively search as deep as necessary to find li tags.
Now the content you are looking for is not in the li tag. It is inside the a tag. So instead use XPath '//div[@id="all-stories"]//li/a' to reach the a tags. The value of the href attribute can be accessed using atag.attrib['href'] , and the text using atag.text_content() .