BeautifulSoup in Python - getting the nth type tag

I have an html code containing a lot of <table> .

I am trying to get the information in the second table. Is there a way to do this without using soup.findAll('table') ?

When I use soup.findAll('table') , I get an error message:

 ValueError: too many values to unpack 

Is there a way to get the nth tag in some code or another way that does not require going through all the tables? Or should I see if I can add headers to tables? (e.g. <table title="things"> )

There are also headings ( <h4>title</h4> ) above each table, if that helps.

Thanks.

EDIT

Here is what I thought when I asked the question:

I unpacked objects into two values ​​when there were much more. I thought this would just give me the first two things from the list, but of course, this all the time gave me the error mentioned above. I did not know that the return value was a list, and thought it was a special object or something else, and I based my code on my friends.

I thought that this error means that there are too many tables on the page and that it cannot handle all of them, so I asked for a way to do this without the method that I used. I probably should have stopped taking things.

Now I know that it returns a list, and I can use it in a for loop or get a value from it using soup.findAll('table')[someNumber] . I learned what unpacking is and how to use it. Thanks to everyone who helped.

Hope this clears up, now that I know what I'm doing, my question makes less sense than when I asked it, so I thought I would just write here what I was thinking.

EDIT 2:

This question is pretty old right now, but I still see that I never understood what I was doing.

If this helps someone, I try to unpack the results of findAll(...) , of which their number was not known.

 useless_table, table_i_want, another_useless_table = soup.findAll("table"); 

Since there was not always the number of tables that I guessed on the page, and all the values ​​in the tuple should be unpacked, I got a ValueError :

 ValueError: too many values to unpack 

So, I was looking for a way to capture the second (or any index table) in the returned tuple, without triggering errors as to how many tables were used.

+8
python beautifulsoup
source share
2 answers

To get the second table from the call to soup.findAll('table') , use it as a list, just index it:

 secondtable = soup.findAll('table')[1] 
+13
source share

Martjin Pieter's answer will make it work. I had some experience with a nested table tag that broke my code when I just simply got the second table in the list, ignoring it.

When you try to find_all and get the nth element, there is a potential that you will ruin, you better find the first element that you want and make sure that the nth element is actually the sibling of this element of children.

  • You can use find_next_sibling() to protect your code
  • you can find the parent first, and then use find_all (recursive = False) to guarantee the search range.

Just in case you need it. I have listed my code below (use recursive = FALSE).

 import urllib2 from bs4 import BeautifulSoup text = """ <html> <head> </head> <body> <table> <p>Table1</p> <table> <p>Extra Table</p> </table> </table> <table> <p>Table2</p> </table> </body> </html> """ soup = BeautifulSoup(text) tables = soup.find('body').find_all('table') print len(tables) print tables[1].text.strip() #3 #Extra Table # which is not the table you want without warning tables = soup.find('body').find_all('table', recursive=False) print len(tables) print tables[1].text.strip() #2 #Table2 # your desired output 
+2
source share

All Articles