Scrambling and Wikipedia page analysis

I am wondering if any existing libraries exist or are accessible from Objective-C, which will allow me to clear pages formatted like this . In particular, all dates and all text next to each date. If not, what would be the best way to do this? Ordinary expressions? I heard that it NSStringmay already have built-in methods for this. It's true?

I looked around to see if there is an alternative to scraping, such as an XML file or an API. I found the API, but the only clients I see are available in other languages, and they seem to just be able to put content on the pages, rather than retrieve it.

EDIT . Therefore, I found additional information about the API at these links:

And I was able to come up with this query that returns some HTML-encoded text (well, XML format, but it includes text text, for example »a href=, etc. I will continue to look through the documents to see if I can do this a little better if not though, are there any recommendations for parsing this?

EDIT 2 : Well, thanks to this doc page , the easiest and cleanest way I was able to retrieve data uses this constructed link , which returns the raw data (in the wiki markup) of the corresponding section. However, I think I would then need to parse this, although if that is the case, it should be much simpler than the whole article.

- wiki, , Objective-C?

==Events==
* [[710]] – [[Saracen]] invasion of [[Sardinia]].
*[[1275]] – Traditional founding of the city of [[Amsterdam]].
*[[1682]] – [[Philadelphia]], [[Pennsylvania]] is founded.

, , , , NSDictionary , . !

+5
7

HTML.

RegEx, - , RegexKitLite ( ). , NSString, , - , . , "" Regex, , .

API- , , , , . , , , .

+1

&format=fmt , API:Data_formats. , : JSON query. XML, JSON .

, HTML -.

+4

, , .

+3

WP -. , , , . - , . WP, . .

WP, .

, - Wikipedians , , - - , WP (, DBPedia - http://dbpedia.org/About).

+3

Python?;) Objective-C. : / , lxml.

+2

, .

- , .

XML, RDF, , , JSON.

0

iPhone, , :

YQL, XPATH DOM.

Personally, I think this is much better than using Regex. Then again I know only simple regular expressions.

0
source

All Articles