How to analyze wiki markup

Hi guys, given a dataset in plain text, for example:

==Events==
* [[312]] – [[Constantine the Great]] is said to have received his famous [[Battle of Milvian Bridge#Vision of Constantine|Vision of the Cross]].
* [[710]] – [[Saracen]] invasion of [[Sardinia]].
* [[939]] – [[Edmund I of England|Edmund I]] succeeds [[Athelstan of England|Athelstan]] as [[King of England]].
*[[1275]] – Traditional founding of the city of [[Amsterdam]].
*[[1524]] – [[Italian Wars]]: The French troops lay siege to [[Pavia]].
*[[1553]] – Condemned as a [[Heresy|heretic]], [[Michael Servetus]] is [[burned at the stake]] just outside [[Geneva]].
*[[1644]] – [[Second Battle of Newbury]] in the [[English Civil War]].
*[[1682]] – [[Philadelphia]], [[Pennsylvania]] is founded.

I would like to get NSDictionaryanother form of collection, so that I can display the year (number on the left) in the excerpt (text on the right). So here is what a pattern is:

*[[YEAR]] – THE_TEXT

Although I would like the excerpt to be plain text, that is, without wiki markup, so no ones [[are set. In fact, this can be difficult using aliases such as [[Edmund I of England|Edmund I]].

I don’t know much about regular expressions, so I have a few questions. Should I try to "decorate" the data first? For example, deleting the first line that will always be ==Events==, and delete the entries [[and ]]?

, , : ? , , * [[710]] [[Saracen]] invasion of [[Sardinia]]. NSArrays.

NSArray [[]] ( , , 530 ..), * [[710]] 710.

NSArray, , [[some_article|alias]], - [[alias]], [[ ]]?

? ? , , ?

! .

: , . , , . wiki , , . !

+1
3

, RegexKitLite:

NSString *data = @"* [[312]] – [[Constantine the Great]] is said to have received his famous [[Battle of Milvian Bridge#Vision of Constantine|Vision of the Cross]].\n\
    * [[710]] – [[Saracen]] invasion of [[Sardinia]].\n\
    * [[939]] – [[Edmund I of England|Edmund I]] succeeds [[Athelstan of England|Athelstan]] as [[King of England]].\n\
    *[[1275]] – Traditional founding of the city of [[Amsterdam]].";

    NSString *captureRegex = @"(?i)(?:\\* *\\[\\[)([0-9]*)(?:\\]\\] \\– )(.*)"; 

    NSRange captureRange;
    NSRange stringRange;
    stringRange.location = 0;
    stringRange.length = data.length;

    do 
    {
        captureRange = [data rangeOfRegex:captureRegex inRange:stringRange];
        if ( captureRange.location != NSNotFound )
        {
            NSString *year = [data stringByMatching:captureRegex options:RKLNoOptions inRange:stringRange capture:1 error:NULL];
            NSString *textStuff = [data stringByMatching:captureRegex options:RKLNoOptions inRange:stringRange capture:2 error:NULL];
            stringRange.location = captureRange.location + captureRange.length;
            stringRange.length = data.length - stringRange.location;
            NSLog(@"Year:%@, Stuff:%@", year, textStuff);
        }
    }
    while ( captureRange.location != NSNotFound );

, RegEx, , , , :

(?i)

, , .

(?:\* *\[\[)

?:: , *, , ( "*" ), ( ).

([0-9]*)

, .

(?:\]\] \– )

, "–". "\" , Objective-C , "\" ... , , "\" "\\" Obj-C.

(.*)

- , RegEX , . , [[LINK]] .

NSRange . .

, RegExKitLite , ( RegexKitLite ).

+3

, . , .

RegexKitLite.

0

Wikitext , . . , ?

, , , , Wikitext. CPAN, , .

Alternatively, you may need a simpler approach and decide which parts of Wikitex you can handle. It can be, for example, links and headings, but not lists. Then you need to focus on each of them and turn Wikitex into what you want to look like. Yes, regular expressions will help with this bit, so read them, and if you have any specific problems, come back and ask.

Good luck

0
source

All Articles