Wikipedia API Mediawiki gets page from URL

I have a set of full urls like

http://en.wikipedia.org/wiki/Episkopi_Bay http://en.wikipedia.org/wiki/Monte_Lauro http://en.wikipedia.org/wiki/Lampedusa http://en.wikipedia.org/wiki/Himera http://en.wikipedia.org/wiki/Lago_Cecita http://en.wikipedia.org/wiki/Aspromonte 

I want to find wikipedia pages for these URLs. I have used the Mediawiki API before, but I cannot figure out how I can do this.

I tried to extract the page title from the URL by taking the lastindexof ("/") substring and the last character, and then requesting the API to get the pageid.

 http://en.wikipedia.org/wiki/Episkopi_Bay --> Episkopi_Bay http://en.wikipedia.org/wiki/Monte_Lauro --> Monte_Lauro http://en.wikipedia.org/wiki/Lampedusa -- > Lampedusa http://en.wikipedia.org/wiki/Himera --> Himera http://en.wikipedia.org/wiki/Lago_Cecita --> Lago_Cecita http://en.wikipedia.org/wiki/Aspromonte --> Aspromonte 

But the problem is that some of my links may be redirected, and therefore the substring may not always be the page title.

TL DR: How can I find a wikipedia page page from a URL?

+5
source share
2 answers

I'm not sure what you call the "page id" is the page ID (for example, 15580374 for the Wikipedias main page in English) found in the "Page Information" in the toobox in the left column) or the normalized name of the page with allowed redirects. The answer below will answer both.

You can use request API = request, for example. https://en.wikipedia.org/w/api.php?action=query&titles=Main%20Page , where you will find the minimum information whose page ID (number).

You can also manage more complex cases: name normalization and / or redirection. Name normalization (initial capital, underscores changed to spaces, various unicode normalizations iirc, etc.) are included in the package. For redirects, you must ask by adding โ€œURLsโ€ and a URL (note that double redirects (= redirect redirection) will not work, but should not be there). Example: https://en.wikipedia.org/w/api.php?action=query&titles=main_page&redirects

If you need more information, you can look at https://en.wikipedia.org/w/api.php?action=help&modules=query%2Binfo .

+4
source

If you only have a URL and donโ€™t know anything about the wiki, you cannot assume that the part after the last / is the title of the page, as the MediaWiki page names may contain / . Instead, you will need to start by querying the siteinfo API, for example:

 https://www.mediawiki.org/wiki/API:Siteinfo 

In the answer of query.general.server and query.general.articlepath in the union, you get the url structure, and query.general.script will provide you with a scriptpath . Depending on where your URL came from, you will need their stand to account for the default //mywiki/scriptpath/index.php?title=Namespace:Foo/Bar and the short form URL //mywiki/articlepath/Namespace:Foo/Bar for an article named Foo/Bar .

To make matters worse, the slash in the title of the article can be either part of the name or a separator for the subpage, depending on the settings of this namespace!

If you know the wiki URL syntax, at hand @ Seb35 has already answered all your questions.

0
source

All Articles