If you just want to highlight keywords, you should start with this (replacing keywords with your own Stata keyword list):
class StataLexer(RegexLexer): name = 'Stata' aliases = ['stata'] filenames = '*.stata' flags = re.MULTILINE | re.DOTALL tokens = { 'root': [ (r'(abstract|case|catch|class|do|else|extends|false|final|' r'finally|for|forSome|if|implicit|import|lazy|match|new|null|' r'object|override|package|private|protected|requires|return|' r'sealed|super|this|throw|trait|try|true|type|while|with|' r'yield)\b', Keyword), ], }
I think your problem is not that you donβt know Python, but you donβt have much experience writing a lexer or understanding how lexer works? Because this implementation is quite simple.
Then, if you want to add more stuff, add an additional element to the root list, a two-element tuple where the first element is a regular expression and the second element denotes a syntax class.
djc
source share