Parsing Character Columns in Boost :: Spirit

I am working on a Boost Spirit 2.0 parser for a small subset of Fortran 77. The problem I am facing is that Fortran 77 is column oriented and I could not find anything in the Spirit that could let its parsers know the columns . Is there any way to do this?

I donโ€™t need to support Fortranโ€™s full secret syntax, but it should be able to ignore lines that have a character in the first column (Fortran comments), and recognize lines with a character in the sixth column as continuation lines.

It seems that people working with batch files will at least have the same problem with the first column as I do. It seems that the spirit has an end-of-line parser, but not a start-line parser (and, strictly speaking, not a column parser (x)).

+4
source share
1 answer

Well, since I now have the answer to this question, I think I should share it.

Fortran 77, like all other languages โ€‹โ€‹that care about columns, is a line-oriented language. This means that your parser must track the EOL and actually use it when parsing.

Another important fact is that in my case itโ€™s not important for me to parse the line numbers that Fortran can put in the early control columns. All I need to know is when he tells me to scan the rest of the line differently.

Given these two things, I could completely handle this problem with the help of the Spirit scroll analyzer. I wrote mine

  • skip the whole line if the first (comment) column contains an alphabetic character.
  • skip the entire line if there is nothing on it.
  • ignore the previous EOL and all up to the fifth column if there is a character in the fifth column. (continuation of the line). This binds it to the previous line.
  • skip all non-eol spaces (even spaces in Fortran don't matter. Yes, it's a weird language.)

Here is the code:

skip = // Full line comment (spirit::eol >> spirit::ascii::alpha >> *(spirit::ascii::char_ - spirit::eol)) [boost::bind (&fortran::parse_info::skipping_line, &pi)] | // remaining line comment (spirit::ascii::char_ ('!') >> *(spirit::ascii::char_ - spirit::eol) [boost::bind (&fortran::parse_info::skipping_line_comment, &pi)]) | // Continuation (spirit::eol >> spirit::ascii::blank >> spirit::qi::repeat(4)[spirit::ascii::char_ - spirit::eol] >> ".") [boost::bind (&fortran::parse_info::skipping_continue, &pi)] | // empty line (spirit::eol >> -(spirit::ascii::blank >> spirit::qi::repeat(0, 4)[spirit::ascii::char_ - spirit::eol] >> *(spirit::ascii::blank) ) >> &(spirit::eol | spirit::eoi)) [boost::bind (&fortran::parse_info::skipping_empty, &pi)] | // whitespace (this needs to be the last alternative). (spirit::ascii::space - spirit::eol) [boost::bind (&fortran::parse_info::skipping_space, &pi)] ; 

I would advise you not to blindly use this on your own for line-oriented Fortran, since I ignore line numbers, and different compilers have different rules for valid comment and continuation characters.

+4
source

All Articles