Emacs syntax highlighting elements are not part of words (with regex?)

Question

Emacs syntax highlighting elements are not part of words (with regex?)

Recently, I switched to emacs, and I'm used to having numbers allocated. The quick hack I took from here puts the following in my .emacs :

 (add-hook 'after-change-major-mode-hook '(lambda () (font-lock-add-keywords nil '(("\\([0-9]+\\)" 1 font-lock-warning-face prepend)))))

This gives a good start, i.e. any digit is highlighted. However, I am a complete newbie with regex and ideally suited for the following behavior:

Also highlight the decimal point if it is part of the float, for example. 12.34
Do not select part of a number if it is next / part of a word. for example in these cases: foo11 ba11r 11spam, none of the "1" should be highlighted.
Allow 'e' within two integers to allow scientific notation (not required, bonus credit)

Unfortunately, this is very similar to the “do it for me” question, which I hate about, but I still have not been able to make any worthy progress.

About how I got it, is finding [^a-zA-Z][0-9]+[^a-zA-Z] to match anything except the letter on both sides (like an equal sign), but all this means that the adjacent character is in the selection. I'm not sure how to say this, just highlight the numbers if there is no letter on both sides.

Of course, I can’t imagine that regex is a way to go with complex syntax highlighting, so any good allocation of numbers in emacs ideas is welcome,

Any help is greatly appreciated. (In case it matters, it is used to encode Python.)

+4

regex emacs syntax-highlighting

Jdog Feb 20 '13 at 12:35

source share

2 answers

First of all, lose add-hook and lambda . The font-lock-add-keywords call is also not needed. If you want this only for python-mode , pass the mode character as the first argument instead of nil .

Secondly, there are two main ways to do this.

Add a grouping construct around the numbers. The numbers in the font lock forms correspond to the groups, so it will be '(("\\([^a-zA-Z]\\([0-9]+\\)[^a-zA-Z]\\)" 2 font-lock-warning-face prepend) . External grouping is useless here, so it can be simplified to '(("[^a-zA-Z]\\([0-9]+\\)[^a-zA-Z]" 1 font-lock-warning-face prepend) .
Just use the start and end of the backslash construction character. Then the regular expression looks like this: \_<[0-9]+\_> . We can highlight the whole match here, so there is no need for the group number: '(("\\_<[0-9]+\\_>" . font-lock-warning-face prepend) . As an option, you can use the construction of the beginning and end of a word, but you probably do not want to highlight numbers adjacent to underscores or any other characters, if any, python-mode in the symbol syntax class.

And finally, there is no need for prepend . These numbers are probably still not highlighted, and if you are considering possible interactions with other small modes such as whitespace , you'd better choose append or just completely omit this element.

Final result:

 (font-lock-add-keywords nil '(("\\_<[0-9]+\\_>" . font-lock-warning-face)))

+5

Dmitry Feb 20 '13 at 13:34

source share

db48x · Accepted Answer · 2013-02-20T13:47:46+0000

Start by accessing the buffers from scratch and enter the text of the test. enter some numbers, some identifiers containing numbers, some numbers with missing parts (e.g. .e12 ), etc. These will be our test tests and allow us to quickly experiment. Now run Mx re-builder to enter Regular Expression Mx re-builder mode, which will allow you to try any regular expression against the text of the current buffer to see if it matches. This is a very convenient mode; You can use it all the time. Note that since Emacs lisp requires you to add regular expressions to strings, you must double all your backslashes. You are already doing it right, but I'm not going to double them.

Thus, restricting the coincidence of numbers that are not part of the identifiers is quite simple. \ b will match word boundaries, so put one at either end of your regex so that it matches the whole word

You can match the floats by simply adding a period to the character class you started with, so that it becomes [0-9.] . Unfortunately, this may coincide with the period when everything is on it; what we really want is [0-9]*\.?[0-9]+ , which will correspond to 0 or more digits, followed by an additional period, followed by one or more digits.

Can a matching character be matched with [-+]? , so we get negative numbers.

To compare the indicators, we need an additional group: \(...\)? , and since we use this only for highlighting and really don’t need to highlight the contents of the group, we can do \(?:...\) , which will save the regular expression. Inside the group, we will need to match "e" ( [eE] ), an optional character ( [-+]? ) And one or more digits ( [0-9]+ ).

Combining all this: [-+]?\b[0-9]*\.?[0-9]+\(?:[eE][-+]?[0-9]+\)?\b Please note that I placed an optional character in front of the first word boundary, because the characters "+" and "-" create the word boundary.

Emacs syntax highlighting elements are not part of words (with regex?)

More articles: