BeautifulSoup in Python - getting the nth type tag

Question

BeautifulSoup in Python - getting the nth type tag

I have an html code containing a lot of <table> .

I am trying to get the information in the second table. Is there a way to do this without using soup.findAll('table') ?

When I use soup.findAll('table') , I get an error message:

 ValueError: too many values to unpack

Is there a way to get the nth tag in some code or another way that does not require going through all the tables? Or should I see if I can add headers to tables? (e.g. <table title="things"> )

There are also headings ( <h4>title</h4> ) above each table, if that helps.

Thanks.

EDIT

Here is what I thought when I asked the question:

I unpacked objects into two values when there were much more. I thought this would just give me the first two things from the list, but of course, this all the time gave me the error mentioned above. I did not know that the return value was a list, and thought it was a special object or something else, and I based my code on my friends.

I thought that this error means that there are too many tables on the page and that it cannot handle all of them, so I asked for a way to do this without the method that I used. I probably should have stopped taking things.

Now I know that it returns a list, and I can use it in a for loop or get a value from it using soup.findAll('table')[someNumber] . I learned what unpacking is and how to use it. Thanks to everyone who helped.

Hope this clears up, now that I know what I'm doing, my question makes less sense than when I asked it, so I thought I would just write here what I was thinking.

EDIT 2:

This question is pretty old right now, but I still see that I never understood what I was doing.

If this helps someone, I try to unpack the results of findAll(...) , of which their number was not known.

 useless_table, table_i_want, another_useless_table = soup.findAll("table");

Since there was not always the number of tables that I guessed on the page, and all the values in the tuple should be unpacked, I got a ValueError :

 ValueError: too many values to unpack

So, I was looking for a way to capture the second (or any index table) in the returned tuple, without triggering errors as to how many tables were used.

+8

python beautifulsoup

nasonfish Dec 30 '12 at 22:50

source share

2 answers

Martjin Pieter's answer will make it work. I had some experience with a nested table tag that broke my code when I just simply got the second table in the list, ignoring it.

When you try to find_all and get the nth element, there is a potential that you will ruin, you better find the first element that you want and make sure that the nth element is actually the sibling of this element of children.

You can use find_next_sibling() to protect your code
you can find the parent first, and then use find_all (recursive = False) to guarantee the search range.

Just in case you need it. I have listed my code below (use recursive = FALSE).

 import urllib2 from bs4 import BeautifulSoup text = """ <html> <head> </head> <body> <table> <p>Table1</p> <table> <p>Extra Table</p> </table> </table> <table> <p>Table2</p> </table> </body> </html> """ soup = BeautifulSoup(text) tables = soup.find('body').find_all('table') print len(tables) print tables[1].text.strip() #3 #Extra Table # which is not the table you want without warning tables = soup.find('body').find_all('table', recursive=False) print len(tables) print tables[1].text.strip() #2 #Table2 # your desired output

+2

B.Mr.W. Nov 03 '13 at 17:13

source share

Martijn pieters · Accepted Answer · 2012-12-30T22:58:31+0000

To get the second table from the call to soup.findAll('table') , use it as a list, just index it:

 secondtable = soup.findAll('table')[1]

BeautifulSoup in Python - getting the nth type tag

More articles: