• Book1<...">
    Geek Answers Handbook

    Get list items inside div tag using xpath

    I have such html

    <div id="all-stories" class="book"> <ul> <li title="Book1" ><a href="book1_url">Book1</a></li> <li title="Book2" ><a href="book2_url">Book2</a></li> </ul> </div> 

    I want to get books and their corresponding url using xpath, but it seems my approach does not work. for simplicity I tried to extract all the elements under the "li" tags as follows

     lis = tree.xpath('//div[@id="all-stories"]/div/text()') 
    +7
    python xpath lxml
    Anurag sharma Jun 29 '13 at 13:49
    source share
    1 answer
     import lxml.html as LH content = '''\ <div id="all-stories" class="book"> <ul> <li title="Book1" ><a href="book1_url">Book1</a></li> <li title="Book2" ><a href="book2_url">Book2</a></li> </ul> </div> ''' root = LH.fromstring(content) for atag in root.xpath('//div[@id="all-stories"]//li/a'): print(atag.attrib['href'], atag.text_content()) 

    gives

     ('book1_url', 'Book1') ('book2_url', 'Book2') 

    XPath //div[@id="all-stories"]/div doesn’t match anything because there is no child div inside the outer div tag.

    XPath //div[@id="all-stories"]/li will not match either, because the div tag does not have a direct child li tage. However, //div[@id="all-stories"]//li matches li tags because // tells XPath to recursively search as deep as necessary to find li tags.

    Now the content you are looking for is not in the li tag. It is inside the a tag. So instead use XPath '//div[@id="all-stories"]//li/a' to reach the a tags. The value of the href attribute can be accessed using atag.attrib['href'] , and the text using atag.text_content() .

    +9
    unutbu Jun 29 '13 at 13:58
    source share

    More articles:

    • Create your own Windows 7 lock screen - c ++
    • How to get a list of cms pages in Magento? - magento
    • Ember Data: how can I delete / unload a record that is stuck in the "inFlight" state? - ajax
    • A virtual base class calls an empty constructor in C ++ (C ++ 11) - c ++
    • Make sphinx autodoc show default values ​​in parameter description - python
    • Tweet your twitter photo - ruby-on-rails
    • How to delete the last line of a file using php? - php
    • Override cache control values ​​in HTTP response - google-chrome
    • try method when trying to get a hash value - ruby ​​| fooobar.com
    • Libcurl delay 1 second before loading data, command line hang - c

    All Articles

    Geek Answers | 2019