Your code works fine for me (Ruby MRI 1.9.3) when I request a wiki page that exists.
When I request a wiki page that does NOT exist, I get a mediawiki 404 error code.
- Steve_Jobs => success
- Steve_Austin => success
- Steve_Rogers => success
- Error Steve_Foo =>
Wikipedia does a ton of caching, so if you see reponses for "Steve_Jobs" that are different from other people that really exist, then it's best to guess because Wikipedia caches an article by Steve Jobs because it is famous and potentially adds additional checks / checks to protect the article from quick changes, corrections, etc.
The solution for you: always open the URL using the User Agent string.
rpage = open(remote_full_url, "User-Agent" => "Whatever you want here").read
Information from Mediawiki docs: "When you make HTTP requests to the MediaWiki web service API, be sure to include the User-Agent header that identifies your client correctly. Do not use the User-Agent provided by your client library by default, but make up the user header. which includes the name and version number of your client: something like "MyCuteBot / 0.1".
In the Wikimedia wiki, if you do not supply the User-Agent header or do not supply the empty or general, your request will fail with HTTP error 403. See Our User-Agent policy. "
joelparkerhenderson
source share