Unix Flex Regex for multi-line comments

Question

Unix Flex Regex for multi-line comments

I am doing Lexical Analyzer using Flex on Unix. If you have ever used it before you know that basically you define a regex for the tokens of any language you write Lexical Analyzer in. I am stuck in the final part. I need the correct Regex for multi-line comments, which allows something like

/* This is a comment \*/

but also allows

 /* This **** //// is another type of comment */

Can anyone help with this?

+8

unix regex flex-lexer

LunaCodeGirl Jan 21 '11 at 6:15

source share

4 answers

If you need to do only with regex, this is really not a very difficult decision:

"/ *" ([^ *] | (\ * + [^ * /])) * \ * + \ /

The full explanation and conclusion of this regular expression is well developed in here .
In short: "/ *" indicates the beginning of a comment ([^ *] | (\ * + [^ * /])) * accepts all characters that are not * ([^ *]), or accept a sequence of one or more *, if the sequence does not have "*" or "/" on it ((* + [^ * /])). This means that all ****** ... sequences will be accepted except ***** /, since you cannot find a sequence * that is not followed by * or a /. The case ******* / is handled by the last RegEx bit, which matches any number *, followed by the / character, to mark the end of the comment ie \ * + \ /

+6

Abraham philip Aug 31 '15 at 10:15

source share

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html does:

 "/*" { comment(); } comment() { char c, c1; loop: while ((c = input()) != '*' && c != 0) putchar(c); if ((c1 = input()) != '/' && c != 0) { unput(c1); goto loop; } if (c != 0) putchar(c1); }

The question that also solves this is How to write a non-greedy match in LEX / FLEX?

0

Ciro Santilli 包子露宪六四事件法轮功 Apr 10 '13 at 12:54

source share

I do not know flex, but I know regular expressions. /\/\*.*?\*\//s must match both types (in PCRE), but if you need to distinguish between them in the analyzer, you can then iterate over the match list to see if they are the second type with /\*\*\s+\/{4}/

-2

Walf Jan 21 '11 at 8:23

source share

Donal fellows · Accepted Answer · 2011-01-21T08:59:37+0000

You disagree with C style comments with a simple regex in Flex; they require a more sophisticated matching method based on start states. The Flex FAQ says how (well, they do for the form /*...*/ ; handling another form only in the <INITIAL> state should be simple).

Unix Flex Regex for multi-line comments

More articles: