Rebol Parse Match Error Message

PEG-based parser generators typically provide limited error reporting on invalid inputs. From what I read, rebol parsing dialogs are inspired by PEG grammar extended regular expressions.

For example, enter the following in JavaScript:

d8> function () {} 

gives the following error because there was no identifier when declaring a global function:

 (d8):1: SyntaxError: Unexpected token ( function () {} ^ 

The analyzer can accurately determine the position during parsing where the expected token is missing. The symbol position of the expected token is used to place the arrow in the error message.

Does the rebol dialog box have built-in tools for reporting row and column errors on invalid inputs?

Otherwise, are there examples of user-defined parsing rules that provide such error messages?

+7
parsing error-handling rebol peg
source share
2 answers

I have made very advanced Rebol parsers that manage live and critical TCP servers and require proper error reporting. Therefore it is important!

Probably one of the most unique aspects of Rebol PARSE is that you can include direct evaluation in the rules. This way you can set variables to track the position of the parsing or error messages, etc. (This is very simple, because the nature of Rebol is that mixing code and data, as one and the same, is the main idea.)

So here is how I did it. Before trying each matching rule, I save the parsing position to "here" (by writing here: , and then also save the error in the variable by executing the code (putting (error: {some error string}) in parentheses to start this dialect). If the match rule is successful, we don’t need to use an error or position ... and we just move on to the next rule. But if this fails, we will get the last state that we set for the report after the failure.

Thus, the dialog dialect template is simple:

 ; use PARSE dialect handling of "set-word!" instances to save parse ; position into variable named "here" here: ; escape out of the parse dialect using parentheses, and into the DO ; dialect to run arbitrary code. Here we run code that saves an error ; message string into a variable named "error" (error: "<some error message relating to rule that follows>") ; back into the PARSE dialect again, express whatever your rule is, ; and if it fails then we will have the above to use in error reporting what: (ever your) [rule | {is}] 

This is basically what you need to do. Here is an example of phone numbers:

 digit: charset "012345689" phone-number-rule: [ here: (error: "invalid area code") ["514" | "800" | "888" | "916" "877"] here: (error: "expecting dash") "-" here: (error: "expecting 3 digits") 3 digit here: (error: "expecting dash") "-" here: (error: "expecting 4 digits") 4 digit (error: none) ] 

Then you can see it in action. Note that we do not set the error to none if we reach the end of the parsing rules. PARSE will return false if there is even more input for the process, so if we notice that there is no error, but PARSE returns false anyway ... we failed because there was too much additional input:

 input: "800-22r2-3333" if not parse input phone-number-rule [ if none? error [ error: "too much data for phone number" ] ] either error [ column: length? copy/part input here newline print rejoin ["error at position:" space column] print error print input print rejoin [head insert/dup "" space column "^^"} print newline ][ print {all good} ] 

The above text will print the following:

 error at position: 4 expecting 3 digits 800-22r2-3333 ^ 

Obviously, you could make much more powerful material, since everything you put in parens will be evaluated in the same way as the Rebol source code. It is very flexible. I even have parsers that update progress bars when loading huge datasets ... :-)

+7
source share

Here is a simple example of finding a position while parsing a string that can be used to accomplish what you ask.

Let's say that our code is valid only if it contains the characters a and b, everything else will be illegal.

 code-rule: [ some [ "a" | "b" ] [ end | mark: (print [ "Failed at position" index? mark ]) ] ] 

Let me check that with some valid code

 >> parse "aaaabbabb" code-rule == true 

Now we can try again with some invalid input

 >> parse "aaaabbXabb" code-rule Failed at position 7 == false 

This is a fairly simplified example language, but should be easily extended to a more complex example.

+3
source share

All Articles