I have a parser and lexer written in ocamlyacc and ocamllex. If the parsing file ends prematurely because I forget the semicolon at the end of the line, the application does not cause a syntax error. I understand this because I raise and caught EOF, and this causes lexer to ignore the unfinished rule, but how do I do this to raise a syntax error?
Here is my current parser (simplified),
%{ let parse_error s = Printf.ksprinf failwith "ERROR: %s" s %} %token COLON %token SEPARATOR %token SEMICOLON %token <string> FLOAT %token <string> INT %token <string> LABEL %type <Conf.config> command %start command %% command: | label SEPARATOR data SEMICOLON { Conf.Pair ($1,$3) } | label SEPARATOR data_list { Conf.List ($1,$3) } | label SEMICOLON { Conf.Single ($1) } label : | LABEL { Conf.Label $1 } data : | label { $1 } | INT { Conf.Integer $1 } | FLOAT { Conf.Float $1 } data_list : | star_data COMMA star_data data_list_ending { $1 :: $3 :: $4 } data_list_ending: | COMMA star_data data_list_ending { $2 :: $3 } | SEMICOLON { [] }
and lexxer (simplified),
{ open ConfParser exception Eof } rule token = parse | ['\t' ' ' '\n' '\010' '\013' '\012'] { token lexbuf } | ['0'-'9']+ ['.'] ['0'-'9']* ('e' ['-' '+']? ['0'-'9']+)? as n { FLOAT n } | ['0'-'9']+ as n { INT n } | '#' { comment lexbuf } | ';' { SEMICOLON } | ['=' ':'] { SEPARATOR } | ',' { COMMA } | ['_' 'a'-'z' 'A'-'Z']([' ']?['a'-'z' 'A'-'Z' '0'-'9' '_' '-' '.'])* as w { LABEL w } | eof { raise Eof } and comment = parse | ['#' '\n'] { token lexbuf } | _ { comment lexbuf }
sample input file
one = two, three, one-hundred; single label; list : command, missing, a, semicolon
One solution is to add a recursive call to the command rule to itself at the end and add an empty rule, all of which build a list to return to the main program. I think that I probably interpret Eof as expected, and the final condition, not the error in lexer, is this correct?
source share