Toning a custom text file format file using C #

I want to parse a text file format that has slightly fancy syntax. Here are some valid lines:

<region>sample=piano C3.wav key=48 ampeg_release=0.7 // a comment here <region>key = 49 sample = piano Db3.wav <region> group=1 key = 48 sample = piano D3.ogg 

I think it would be too difficult for me to come up with a regular expression that makes sense, but I wonder if there is a good way to tokenize this type of input without writing my own parser? ie I would like something that reads the above input and splashes out the stream of "tokens", for example, the output for starting my example format would be something like this:

 new Region(), new Sample("piano C3.wav"), new Key("48"), new AmpegRelease("0.7"), new Region() 

Is there a good library / tutorial that would point me in the right direction for an elegant way to implement this?

Update: I tried this with Irony , but the quirks of the syntax I need to parse (in particular, the fact that the data following the pattern = can take place in it) made me better write my own code in based on String.Split. See discussion here .

+4
source share
3 answers

For this type of thing, I would get a lightweight but reliable CoCo / R. If you show me some more input examples, I can come up with a starting point for the grammar.


I used lex and yacc before, so I have experience with parsing. - Mark Heath 17 min ago

Well, you're in luck: I found lex grammar for sfz in Fedora's soundfont-utils package. This package contains sfz2pat util . You can get the package (source) here:

http://rpmfind.net//linux/RPM/fedora/14/i386/soundfont-utils-0.4-10.fc12.i686.html ( src.rpm )

According to a quick study, the latest version of the grammar is from November 2004, but rather complicated (58k in sfz2pat.l). Here is a sample to get a taste:

 %option noyywrap %option nounput %option outfile = "sfz2pat.c" nm ([^\n]+".wav"|[^ \t\n\r]+|\"[^\"\n]+\") ipn [A-Ga-g][#b]?([0-9]|"-1") %s K %% "//".* ; <K>"<group>" { int i; leave_region(); leave_group(); if (!enter_group()) { SFZERR "Can't start group\n"); return 1; } am_in_group_scope = TRUE; for (i = FIRST_SFZ_PARM; i < MAX_SFZ_PARM; i++) group_parm[i] = default_parm[i]; for (i = 0; i < MAX_FLOAT_PARM; i++) group_flt_parm[i] = default_flt_parm[i]; group_parm[REGION_IN_GROUP] = current_group; BEGIN(0); } <K>"<region>" { int i; if (!am_in_group) { SFZERR "Can't start region outside group.\n"); return 1; } leave_region(); if (!enter_region()) { SFZERR "Can't start region\n"); return 1; } am_in_group_scope = FALSE; for (i = 0; i < MAX_SFZ_PARM; i++) region_parm[i] = group_parm[i]; for (i = 0; i < MAX_FLOAT_PARM; i++) region_flt_parm[i] = group_flt_parm[i]; BEGIN(0); } <K>"sample="{nm} { int i = 7, j; unsigned namelen; if (yytext[i] == '"') { i++; for (j = i; j < yyleng && yytext[j] != '"'; j++) ; } else j = yyleng; namelen = (unsigned)(j - i + 1); sfzname = strncpy( (char *)malloc(namelen), yytext+i, (unsigned)(ji) ); sfzname[ji] = '\0'; for (i = 0; i < (int)namelen; i++) if (sfzname[i] == '\\') sfzname[i] = '/'; SFZDBG "Sample name is \"%s\"", sfzname); SFZNL if (read_sample(sfzname)) { #ifndef LOADER fprintf(stderr, "\n"); #endif return 0; } BEGIN(0); } [...snip...] 
+2
source

Assuming the language is fairly regular, I would recommend writing a quick parser using ANTLR . He got a pretty easy learning curve for someone who has experience with parsing, and he outputs C # (by the way).

+1
source

I used Gardens Point LEX and Gars Point Parser Generator to generate parsers. They work well, especially if you have lex / yacc knowledge.

IMO, these two make the best parser generator for .NET.

One bonus point: authors quickly respond to error messages and suggestions, as can be seen here .

+1
source

All Articles