I want to capture the age, place of birth and previous occupation of senators. Information for each individual senator is available on Wikipedia, on their respective pages, and there is another page with a table listing all senators by name. How can I go through this list, follow the links to the corresponding pages of each senator and get the information I want?
Here is what I have done so far.
1. (no python). I found out that DBpedia exists and wrote a query to search for senators. Unfortunately, DBpedia did not classify most (if any) of them:
SELECT ?senator, ?country WHERE { ?senator rdf:type <http://dbpedia.org/ontology/Senator> . ?senator <http://dbpedia.org/ontology/nationality> ?country }
Request results are unsatisfactory.
2. It turned out that there is a python module called wikipedia that allows me to search and retrieve information from individual wiki pages. Used to get a list of senator names from the table, viewing hyperlinks.
import wikipedia as w w.set_lang('pt')
At this moment I was a little lost. Here the list of senators contains all the names of senators, but also other names, for example, the names of parties. The wikipidia module (at least from what I could find in the API documentation) also does not implement functionality to follow links or look up tables.
I saw two related entries here in StackOverflow that seem useful, but both of them ( here and here ) extract information from one page.
Can someone point me to a solution?
Thanks!
source share