Unix Flex Regex for multi-line comments

I am doing Lexical Analyzer using Flex on Unix. If you have ever used it before you know that basically you define a regex for the tokens of any language you write Lexical Analyzer in. I am stuck in the final part. I need the correct Regex for multi-line comments, which allows something like

/* This is a comment \*/ 

but also allows

 /* This **** //// is another type of comment */ 

Can anyone help with this?

+8
unix regex flex-lexer
source share
4 answers

You disagree with C style comments with a simple regex in Flex; they require a more sophisticated matching method based on start states. The Flex FAQ says how (well, they do for the form /*...*/ ; handling another form only in the <INITIAL> state should be simple).

+14
source share

If you need to do only with regex, this is really not a very difficult decision:


"/ *" ([^ *] | (\ * + [^ * /])) * \ * + \ /
The full explanation and conclusion of this regular expression is well developed in here .
In short: "/ *" indicates the beginning of a comment ([^ *] | (\ * + [^ * /])) * accepts all characters that are not * ([^ *]), or accept a sequence of one or more *, if the sequence does not have "*" or "/" on it ((* + [^ * /])). This means that all ****** ... sequences will be accepted except ***** /, since you cannot find a sequence * that is not followed by * or a /. The case ******* / is handled by the last RegEx bit, which matches any number *, followed by the / character, to mark the end of the comment ie \ * + \ /
+6
source share

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html does:

 "/*" { comment(); } comment() { char c, c1; loop: while ((c = input()) != '*' && c != 0) putchar(c); if ((c1 = input()) != '/' && c != 0) { unput(c1); goto loop; } if (c != 0) putchar(c1); } 

The question that also solves this is How to write a non-greedy match in LEX / FLEX?

0
source share

I do not know flex, but I know regular expressions. /\/\*.*?\*\//s must match both types (in PCRE), but if you need to distinguish between them in the analyzer, you can then iterate over the match list to see if they are the second type with /\*\*\s+\/{4}/

-2
source share

Source: https://habr.com/ru/post/651335/


All Articles