How to imagine vertical alignment of code syntax using BNF, EBNF, etc.?

Question

How to imagine vertical alignment of code syntax using BNF, EBNF, etc.?

How to say that (in BNF, EBNF, etc.) any two or more letters are placed in the same vertical alignment

For example, in python 2.x we have what we call indentation .

 def hello(): print "hello," print "world" hello()

Note letter p (second line) is placed in the same vertical alignment of the letter p (third line)

Further example (in markdown):

 MyHeader ======== topic -----

Note M , and the first = are placed in the same vertical alignment (also r and last = , t and first - , c and last - )

My question is How to imagine vertical alignment of letters using BNF, EBNF, etc.

Note: My point on this question is to find a presentation method for vertically aligning the code, and not just to know how to write Python or Markdown BNF or EBNF.

+7

syntax indentation vertical-text bnf ebnf

fronthem Jan 05 '15 at 19:09

source share

1 answer

rici · Accepted Answer · 2015-01-06T02:14:21+0000

You can analyze indentation-sensitive languages (like Python or Haskell) using a small hack, which is well described in the Python help chapter in lexical analysis . As described, the lexical analyzer turns leading spaces into INDENT and DEDENT [Note 1], which are then used in a Python grammar in a simple way. Here is a quick excerpt:

 suite ::= stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT statement ::= stmt_list NEWLINE | compound_stmt stmt_list ::= simple_stmt (";" simple_stmt)* [";"] while_stmt ::= "while" expression ":" suite ["else" ":" suite]

So, if you are ready to describe (or link) the lexical analysis algorithm, BNF is simple.

However, you cannot write this algorithm as context-free grammar, because it is not contextual. (I will leave this proof, but it is similar to the proof that a ⁿ b ⁿ c ⁿ not the context that you can find in most simple textbooks on the formal language and across the Internet.)

The ISO EBNF standard (free PDF is available) provides a way to include “extensions that a user may require”: a Special-sequence , which is any text that does not contain a ? surrounded on both sides by a symbol ? . Thus, you can abuse the notation by including [Note 2]:

 DEDENT = ? See section 2.1.8 of https://docs.python.org/3.3/reference/ ? ;

Or you can insert a full description of the algorithm. Of course, none of these methods would allow the parser to produce an accurate lexical analyzer, but it would be a reasonable way to convey intentions to the human reader.

It is worth noting that EBNF itself uses a special sequence to determine one of its production facilities:

 (* see 4.7 *) syntactic exception = ? a syntactic-factor that could be replaced by a syntactic-factor containing no meta-identifiers ? ;

Notes

The lexical analyzer also converts some physical newline characters to NEWLINE tokens, and other newline characters disappear.
EBNF usually uses the syntax = instead of ::= for production and insists that they be completed with ; . Comments are between (* and *) .

How to imagine vertical alignment of code syntax using BNF, EBNF, etc.?

Notes

More articles: