Extensive documentation on how to write lexer for Pygments?

I have a Stata keyword dictionary and a reasonable knowledge of Stata syntax. I would like to devote several hours to turning it into Stata vocabulary for Pygments .

However, I cannot find enough documentation about the syntax of the lexers, and I cannot start coding the lexer. Can someone point out a good tutorial for writing new lexers for Pyigs?

I know about the Pigments API and the lexer development page, but to be honest, this is not enough for someone like me with very limited knowledge of Python.

My strategy so far has been to search for examples. I found quite a lot , for example. Puppet , Sass , Scala , Ada . They really helped. Any help on how to get started with your Stata keywords is welcome.

+7
python pygments stata
source share
2 answers

If you just want to highlight keywords, you should start with this (replacing keywords with your own Stata keyword list):

class StataLexer(RegexLexer): name = 'Stata' aliases = ['stata'] filenames = '*.stata' flags = re.MULTILINE | re.DOTALL tokens = { 'root': [ (r'(abstract|case|catch|class|do|else|extends|false|final|' r'finally|for|forSome|if|implicit|import|lazy|match|new|null|' r'object|override|package|private|protected|requires|return|' r'sealed|super|this|throw|trait|try|true|type|while|with|' r'yield)\b', Keyword), ], } 

I think your problem is not that you don’t know Python, but you don’t have much experience writing a lexer or understanding how lexer works? Because this implementation is quite simple.

Then, if you want to add more stuff, add an additional element to the root list, a two-element tuple where the first element is a regular expression and the second element denotes a syntax class.

+6
source share

I recently tried writing a pygments lexer (for BibTeX, which has simple syntax) and I agree with your assessment that the resources there are not very useful for people unfamiliar with the concepts of Python parsing or general code.

What I found most useful was the collection of lexers included in Pyigs .

There is a _mapping.py file that lists all recognized language formats and links to the lexer object for each of them. To build my lexer, I tried to think of languages ​​that had similar constructs with those I was processing, and checked if I could come up with something. Some of the built-in lexers are more complex than I wanted, but others were useful.

+4
source share

All Articles