Open Source Parser Code for Mediawiki Markup

Question

Open Source Parser Code for Mediawiki Markup

I'm interested in sample analysis of Mediawiki XML markup to generate a custom HTML page, which is a subset of HTML created using the PHP Mediawiki rendering engine.

I want this to be for BzReader, a standalone Mediawiki editor for a compressed dump written in C #. So a C # parser would be ideal, but any good code would help.

Of course, if no one has done this before, I think it's time to start a project that supports a free and separate Mediawiki parser, based on Mediawiki's own parser, but less closely integrated with Mediawiki itself.

So, does anyone know of any base from which I could start, this would be better than hacking from the Mediawiki PHP code?

+6

c # php parsing open-source mediawiki

Asaf bartov Nov 27 '08 at 10:35

source share

3 answers

Update
Conversely, Screwturn does not adhere to the Mediawiki syntax, but uses its own version, which changes slightly.

The Mediawiki syntax is not amenable to LALR (or even LL *) analysis, since there are many uncertainties in its definition and also allows HTML. There this one is discussed in this question , you essentially stick to writing your own analyzer and tokenizer, and not just writing a BNF file for it, and then using ANTLR / Gold / Irony.

The Roadkill Wiki uses the Creole parser to parse its Mediawiki, but with limited support.

Screwturn is licensed under the GPL and has a C # parser:

Screw license
Download the source code (unfortunately there is no web svn)

The class you're in is Core.Formatter, which has many regexes to do its job:

public static class Formatter { }

This is not the most beautiful code, "but it works."

+6

Chris s Feb 10 '09 at 13:31

source share

I had a few words to say about Mediawiki templates here . Interestingly, now there is a list of alternative parsers, I will have to investigate this.

+4

Greg hewgill Nov 27 '08 at 23:02

source share

wimh · Accepted Answer · 2008-11-27T22:42:11+0000

There is a list of parsers at http://www.mediawiki.org/wiki/Alternative_parsers , but the C # parser is not included there ...

Open Source Parser Code for Mediawiki Markup

More articles: