How to find child nodes using BeautifulSoup
I want to get all <a> tags that are descendants of <li> :
<div> <li class="test"> <a>link1</a> <ul> <li> <a>link2</a> </li> </ul> </li> </div> I know how to find an element with a specific class, like this:
soup.find("li", { "class" : "test" }) But I do not know how to find all <a> who are children <li class=test> but no others.
How I want to choose:
<a>link1</a> Try this
li = soup.find('li', {'class': 'text'}) children = li.findChildren("a" , recursive=False) for child in children: print child There is a very small section in the DOC that shows how to find / find direct children.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument
In your case, as you want, link1, which is the first direct descendant:
# for only first direct child soup.find("li", { "class" : "test" }).find("a", recursive=False) If you want all direct children:
# for all direct children soup.find("li", { "class" : "test" }).findAll("a", recursive=False) Maybe you want to do
soup.find("li", { "class" : "test" }).find('a') try the following:
li = soup.find("li", { "class" : "test" }) children = li.find_all("a") # returns a list of all <a> children of li other reminders:
The find method gets only the first incoming child. The find_all method gets all descendant elements and is stored in a list.
Another method is to create a filter function that returns True for all desired tags:
def my_filter(tag): return (tag.name == 'a' and tag.parent.name == 'li' and 'test' in tag.parent['class']) Then just call find_all with the argument:
for a in soup(my_filter): # or soup.find_all(my_filter) print a "How to find all that are children a <li class=test> but not any others?"
Given the HTML select_one below (I added another <a> to show the difference between select and select_one ):
<div> <li class="test"> <a>link1</a> <ul> <li> <a>link2</a> </li> </ul> <a>link3</a> </li> </div> The solution is to use a child combinator ( > ) that fits between two CSS selectors:
>>> soup.select('li.test > a') [<a>link1</a>, <a>link3</a>] If you want to find only the first child:
>>> soup.select_one('li.test > a') <a>link1</a>