How to express the industry in the Rebol PARSE dialect?

Question

How to express the industry in the Rebol PARSE dialect?

I have a mysql schema as shown below:

data: { `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(10) DEFAULT '' COMMENT 'the name', `content` text COMMENT 'something', }

Now I want to extract some information from it: name, type and comment, if any. See below:

 ["id" "int" "" "name" "varchar" "the name" "content" "text" "something" ]

My code is:

 parse data [ any [ thru {`} copy field to {`} {`} thru some space copy field-type to [ {(} | space] (comm: "") opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}] (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""]) ] ]

but I get something like this:

 ["id" "int" "the name" "content" "text" "something"]

I know that the opt .. string is incorrect.

I want to express if the COMMENT keyword is found, and then extract the comment information; if it is found first, then continue the next cycle. But I do not know how to express it. Anyone can help?

+5

parsing rebol rebol3

Wayne cui May 24 '15 at 11:33

source share

4 answers

I think this is closer to what you need.

 data: { `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(10) DEFAULT '' COMMENT 'the name', `content` text COMMENT 'something', } temp: [] parse data [ any [ thru {`} copy field to {`} {`} some space copy field-type to [ {(} | space] (comm: copy "") opt [ thru {COMMENT} some space thru {'} copy comm to {'}] (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""]) ] ] probe temp

To break the differences.

Configure a word with an empty block for temp
Changed thru some space only to some space , as this will move forward through the series in the same way. Please note: false
```
 parse " " [ thru some space ] 
```
Changed comm: "" to comm: copy "" to make sure you get a new line every time you extract a comment (does not affect the output, but this is good practice)
Changed {COMMENT} thru some space to {COMMENT} some space according to comment 2.
Just added a probe at the end for debugging

As a note, you can use ?? (almost) anywhere in the parse rule to help with debugging that will show you the current position.

+3

johnk May 24, '15 at 12:13

source share

parse / all for string parsing

 data: { `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(10) DEFAULT '' COMMENT 'the name', `content` text COMMENT 'something', } nodata: charset { ()'} dat: complement nodata collect [ parse/all data [ some [ thru {`} copy field to {`} (keep field) skip some " " copy type some dat ( keep type comm: copy "" ) copy rest thru "," ( parse/all rest [ some [ ["," (keep comm) ] | ["COMMENT" some nodata copy comm to "'" ] | skip ] ] ) ] ] ] == ["id" "int" "" "name" "varchar" "the name" "content" "text" "something"]

another (best) solution with pure analysis

 collect [ probe parse/all data [ some [ thru {`} copy field to {`} (keep field) skip some " " copy type some dat ( keep type comm: "" further: []) some [ "," (keep comm further: [ to end skip]) | ["COMMENT" some nodata copy comm to "'" ] | skip further ] ] ] ]

+3

sqlab May 25, '15 at 0:40

source share

I will figure out an alternative way to get data as a block! but not a line !.

 data: read/lines data.txt probe data temp: copy [] foreach d data [ parse d [ thru {`} copy field to {`} {`} thru some space copy field-type to [ {(} | space] (comm: "") opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}] (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""]) ] ] probe temp

+1

Wayne cui May 24 '15 at 12:01

source share

rgchris · Accepted Answer · 2015-05-25T06:59:33+0000

I really love (whenever possible) creating a set of grammar rules with positive terms to match the target input - I find it more literate, accurate, flexible and easier to debug. In the above snippet, we can identify five main components:

 space: use [space][ space: charset "^-^/ " [some space] ] word: use [letter][ letter: charset [#"a" - #"z" #"A" - #"Z" "_"] [some letter] ] id: use [letter][ letter: complement charset "`" [some letter] ] number: use [digit][ digit: charset "0123456789" [some digit] ] string: use [char][ char: complement charset "'" [any [some char | "''"]] ]

With certain terms, writing a rule that describes the grammar of input is relatively trivial:

 result: collect [ parsed?: parse/all data [ ; parse/all for Rebol 2 compatibility opt space some [ (field: type: none comment: copy "") "`" copy field id "`" space copy type word opt ["(" number ")"] any [ space [ "COMMENT" space "'" copy comment string "'" | word | "'" string "'" | number ] ] opt space "," (keep reduce [field type comment]) opt space ] ] ]

As an added bonus, we can confirm the entry.

 if parsed? [new-line/all/skip result true 3]

One of the new-line applications, to make friends a little, should give:

 == [ "id" "int" "" "name" "varchar" "the name" "content" "text" "something" ]

How to express the industry in the Rebol PARSE dialect?

More articles: