The legacy application I'm working with has a funky SGS data format. I reviewed and started working with several brute force solutions, including a manual final state machine and my own recursive descent parser, but I try to create an application in which the volume (non-library) of the source code is enough to express what needs to be done.
So, I was looking at a parser based on Clojure. I was messing with
None of them have enough documentation / support on the network to disconnect me. So I'm looking for someone who has experience with one of these tools (or a good alternative) to give me a hand.
Here's the data language:
Data is represented by rows labeled (starting with column 1) and 1 or more fields separated by one or more spaces.
Fields consist of one or more subfields separated by commas. Commas may be followed by spaces for readability, but they are not significant.
Labels are identifiers consisting of characters in the set [- $ 0-9A-Z _ *%] and do not have to be unique.
- , (). . , .
Space-dot-space . - . . , , .
. , , .
( ) .
:
. Comment
.
LAB1 F1S1 . Minimal data row, with line comment
LAB1 F1S1,F1S2,F1S3 F2S1 F3S1 . 2nd row with same label
LAB2 , , , F1S4 ''Field
LAB99 F1S1, . Field 1 has 2 subfields, 2nd is nil
LAB3 F1S1,F1S2, ;
F1S3 ;
F2S1 . Row continued over 3 lines.
, :
[
("LAB1" ["F1S1"])
("LAB1" ["F1S1" "F1S2" "F1S3"] ["F2S1"] ["F3S1"])
("LAB2" [nil nil nil "F1S4"] ["Field #2 (only 1 subfield"] ["F3S1" nil "F3S3"])
("LAB99" ["F1S1" nil])
("LAB3" ["F1S1" "F1S2" "F1S3"] ["F2S1"])
]
UPDATE:
@edwood . , , ", ".
, InstaParse, sorta-works:
SGS = (<COMMENT_LINE> / DATA_LINES) *
COMMENT_LINE = #' *\\.(?: [^\\n]*)?\\n'
DATA_LINES = LABEL FIELDS SEPARATOR? (LINE_COMMENT | '\\n')
LABEL = IDENTIFIER
FIELDS = '' | (SEPARATOR FIELD)+
SEPARATOR = CONTINUATION #' +' | #' +' (CONTINUATION #' *')?
CONTINUATION = #'; *\\n'
LINE_COMMENT = #' .[^\\n]*\\n'
FIELD = SUBFIELD (',' SEPARATOR? SUBFIELD)*
SUBFIELD = IDENTIFIER | QUOTED_STRING | ''
IDENTIFIER = #'[-$0-9A-Z_*%]+'
QUOTED_STRING = #'\\'\\'[^\\']*\\'\\''
249 , , . , , -, 431 2
CompilerException java.lang.OutOfMemoryError: Java, : (sgs2.clj: 40: 13)
regexp-handled regexps, , , . , , .
228 , 16 . , . - ?