Parser / Lexer ignores incomplete grammar rules

I have a parser and lexer written in ocamlyacc and ocamllex. If the parsing file ends prematurely because I forget the semicolon at the end of the line, the application does not cause a syntax error. I understand this because I raise and caught EOF, and this causes lexer to ignore the unfinished rule, but how do I do this to raise a syntax error?

Here is my current parser (simplified),

%{ let parse_error s = Printf.ksprinf failwith "ERROR: %s" s %} %token COLON %token SEPARATOR %token SEMICOLON %token <string> FLOAT %token <string> INT %token <string> LABEL %type <Conf.config> command %start command %% command: | label SEPARATOR data SEMICOLON { Conf.Pair ($1,$3) } | label SEPARATOR data_list { Conf.List ($1,$3) } | label SEMICOLON { Conf.Single ($1) } label : | LABEL { Conf.Label $1 } data : | label { $1 } | INT { Conf.Integer $1 } | FLOAT { Conf.Float $1 } data_list : | star_data COMMA star_data data_list_ending { $1 :: $3 :: $4 } data_list_ending: | COMMA star_data data_list_ending { $2 :: $3 } | SEMICOLON { [] } 

and lexxer (simplified),

 { open ConfParser exception Eof } rule token = parse | ['\t' ' ' '\n' '\010' '\013' '\012'] { token lexbuf } | ['0'-'9']+ ['.'] ['0'-'9']* ('e' ['-' '+']? ['0'-'9']+)? as n { FLOAT n } | ['0'-'9']+ as n { INT n } | '#' { comment lexbuf } | ';' { SEMICOLON } | ['=' ':'] { SEPARATOR } | ',' { COMMA } | ['_' 'a'-'z' 'A'-'Z']([' ']?['a'-'z' 'A'-'Z' '0'-'9' '_' '-' '.'])* as w { LABEL w } | eof { raise Eof } and comment = parse | ['#' '\n'] { token lexbuf } | _ { comment lexbuf } 

sample input file

 one = two, three, one-hundred; single label; list : command, missing, a, semicolon 

One solution is to add a recursive call to the command rule to itself at the end and add an empty rule, all of which build a list to return to the main program. I think that I probably interpret Eof as expected, and the final condition, not the error in lexer, is this correct?

+4
source share
1 answer

ocamlyacc does not necessarily consume all input. If you want to make it crash if all the input is not parsed, you need to map the EOF in your grammar. Instead of raising EOF in lexer, add an EOF token and change the start character to

 %type <Conf.config list> main main: EOF { [] } | command main { $1::$2 } 
+5
source

All Articles