I have this code to support error reporting, which should be handled with care, spatter meaningful messages and "skip rules" around the code. But there is no ready-made alternative: DCG is an excellent computational engine, but it cannot compete with special parsing mechanisms that can automatically emit error messages using the theoretical properties of target grammars ...
:- dynamic text_length/1. parse_conf_cs(Cs, AST) :- length(Cs, TL), retractall(text_length(_)), assert(text_length(TL)), phrase(cfg(AST), Cs). .... %% tag(?T, -X, -Y)// is det. % % Start/Stop tokens for XML like entries. % Maybe this should restrict somewhat the allowed text. % tag(T, X, Y) --> pos(X), unquoted(T), pos(Y). .... %% pos(-C, +P, -P) is det. % % capture offset from end of stream % pos(C, P, P) :- text_length(L), length(P, Q), C is L - Q.
tag // 3 is just an example of use, in this parser I create an editable AST, so I save the position so that I can correctly attribute each nested part in the editor ...
change
small extension for id // 1: SWI-Prolog has a specialized type_type / 2 code for this:
1 ?- code_type(0'a, csymf). true. 2 ?- code_type(0'1, csymf). false.
so (attenuation over a literal transformation)
id([C|Cs]) --> [C], {code_type(C, csymf)}, id_rest(Cs). id_rest([C|Cs]) --> [C], {code_type(C, csym)}, id_rest(Cs). id_rest([]) --> [].
depending on your attitude to the generalization of small fragments and actual grammar data, id_rest // 1 can be written in a reusable way and made deterministic
id([C|Cs]) --> [C], {code_type(C, csymf)}, codes(csym, Cs). % greedy and deterministic codes(Kind, [C|Cs]) --> [C], {code_type(C, Kind)}, !, codes(Kind, Cs). codes(Kind, []), [C] --> [C], {\+code_type(C, Kind)}, !. codes(_, []) --> [].
this stricter definition of id // 1 will also remove some ambiguous wrt attributes with keyword prefixes: keyword recoding // 1 like
keyword(K) --> id(id(K)), {memberchk(K, [ array, break, ... ]}.
will correctly identify
?- phrase(tokenize(Ts), `if1*2`). Ts = [id(if1), *, int(2)] ;
Your line // 1 (OT: what an unsuccessful collision with the library (dcg / basics): string // 1) is an easy candidate for implementing a simple error recovery strategy:
stringChar(0'\") --> "\\\\". stringChar(0'") --> pos(X), "\n", {format('unclosed string at ~d~n', [X])}.
This is an example of a “report error and insertion of a missing token”, so the analysis can continue ...