What is the general pattern for creating dcg for file input?

It seems like I'm always trying to write DCG to parse input files. But it seems like it should be easy? Are there any tips or tricks to think about this issue?

For a specific example, let's say I want to parse a fasta file. ( https://en.wikipedia.org/wiki/FASTA_format ). I want to read every description and every sequence of going back.

:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- portray_text(true).
:- set_prolog_flag(double_quotes, codes).
:- set_prolog_flag(back_quotes,string).

fasta_file([]) -->[].
fasta_file([Section|Sections]) -->
   fasta_section(Section),
   fasta_file(Sections).


fasta_section(Section) -->
    fasta_description(Description),
    fasta_seq(Sequence),
    {Section =.. [section,Description,Sequence]}.

fasta_description(Description) -->
    ">",
    string(Description),
    {no_gt(Description),
     no_nl(Description)}.


fasta_seq([]) --> [].
fasta_seq(Seq) -->
    nt([S]),
    fasta_seq(Ss),
    {S="X"->Seq =Ss;Seq=[S|Ss]}.

 nt("A") --> "A".
 nt("C") --> "C".
 nt("G") --> "G".
 nt("T") --> "T".
 nt("X") --> "\n".

 no_gt([]).
 no_gt([E|Es]):-
     dif([E],">"),
     no_gt(Es).

 no_nl([]).
 no_nl([E|Es]):-
     dif([E],"\n"),
     no_nl(Es).

Now this is clearly wrong. The behavior I would like was

 ?-phrase(fasta_section(S),">frog\nACGGGGTACG\n>duck\nACGTTAG").
 S = section("frog","ACGGGGTACG");
 S = section("duck","ACGTTAG");
 false.

But if I did, the phrase(fasta_file(Sections),">frog\nACGGGGTACG\n>duck\nACGTTAG).Partitions were merged with the list of partitions / 2s, what I want, but my current code seems pretty hacked - how I handled a newline, for example.

+4
source share
1

, "" :

nt("A") -->"A",
nt("C") -->"C",
nt("G") -->"G",
nt("T") -->"T". 

nt("A") -->"A".
nt("C") -->"C".
nt("G") -->"G".
nt("T") -->"T". 

, DCG, Prolog MySQL ( SQL, ), , - , , UTF8 (?) .

/3, , . , , .

, , SWI-Prolog.

,

...
dif([E],">"),
...

? DCG , SWI-Prolog ...

, ... , ...

fasta_file([]) -->[].
fasta_file([Section|Sections]) -->
    fasta_section(Section),
    fasta_file(Sections).

fasta_section(section(Description,Sequence)) -->
    fasta_description(Description),
    fasta_seq(SequenceCs), {atom_codes(Sequence, SequenceCs)}, !.

fasta_description(Description) -->
    ">", string(DescriptionCs), "\n", {atom_codes(Description, DescriptionCs)}.

fasta_seq([S|Seq]) --> nt(S), fasta_seq(Seq).
fasta_seq([]) --> "\n" ; []. % optional \n at EOF

nt(0'A) --> "A".
nt(0'C) --> "C".
nt(0'G) --> "G".
nt(0'T) --> "T".

?- phrase(fasta_file(S), `>frog\nACGGGGTACG\n>duck\nACGTTAG`).
S = [section(frog, 'ACGGGGTACG'), section(duck, 'ACGTTAG')] ;
false.

: fasta_seq//1 , "" - . , SQL, MB .

?- phrase((string(_),fasta_section(S)), `>frog\nACGGGGTACG\n>duck\nACGTTAG`,_).
S = section(frog, 'ACGGGGTACG') ;
S = section(duck, 'ACGTTAG') ;
false.

fasta_section//1 . , . // 1 (dcg/basics)

0

All Articles