Analysis of distorted languages ​​without a dictionary (for example, Latin)

Taking an example from Introduction to Latin Wikiversity , consider the sentence:

the sailor gives the girl money 

We can handle this in Prolog with DCG pretty elegantly with this bunch of rules:

 sentence(s(NP, VP)) --> noun_phrase(NP), verb_phrase(VP). noun_phrase(Noun) --> det, noun(Noun). noun_phrase(Noun) --> noun(Noun). verb_phrase(vp(Verb, DO, IO)) --> verb(Verb), noun_phrase(IO), noun_phrase(DO). det --> [the]. noun(X) --> [X], { member(X, [sailor, girl, money]) }. verb(gives) --> [gives]. 

And we see that this works:

 ?- phrase(sentence(S), [the,sailor,gives,the,girl,money]). S = s(sailor, vp(gives, money, girl)) ; 

It seems to me that DCG is really optimized for processing dictionary languages. I will completely lose how to handle this Latin sentence:

  nauta dat pecuniam puellae 

This means the same thing (a sailor gives money to a girl), but the word order is completely free: all these permutations also mean the same thing:

 nauta dat puellae pecuniam nauta puellae pecuniam dat puellae pecuniam dat nauta puellae pecuniam nauta dat dat pecuniam nauta puellae 

The first thing that happens to me is to list the permutations:

 sentence(s(NP, VP)) --> noun_phrase(NP), verb_phrase(VP). sentence(s(NP, VP)) --> verb_phrase(VP), noun_phrase(NP). 

but this will not be done, because while nauta refers to the subject of the noun-noun, puellae , which belongs to the name of the noun, is subject to the verb, but can precede it. I wonder if I should approach it by first creating some sort of assignable list:

 ?- attributed([nauta,dat,pecuniam,puellae], Attributed) Attributed = [noun(nauta,nom), verb(do,3,s), noun(pecunia,acc), noun(puella,dat)] 

This seems to be necessary (and I don’t see a good way to do this), but grammatically it pushes food onto my plate. Maybe I could write a parser with some terrible non-DCG device like this:

 parse(s(NounPhrase, VerbPhrase), Attributed) :- parse(subject_noun_phrase(NounPhrase, Attributed)), parse(verb_phrase(VerbPhrase, Attributed)). parse(subject_noun_phrase(Noun), Attributed) :- member(noun(Noun,nom), Attributed). parse(object_noun_phrase(Noun), Attributed) :- member(noun(Noun,acc), Attributed) 

This seems to work, but only as long as I don't have recursion; as soon as I submit a subject clause, I am going to reuse the items in an unhealthy way.

I just don’t see how to get from a sentence not containing a word into the parsing tree. Is there a book that discusses this? Thanks.

+7
prolog dcg
source share
3 answers

Here I found a related resource (PERMUAL GRAMMER FOR THE LANGUAGES OF THE FREE WORD STATEMENT). It seems worth reading (Hey, we all hated these obligatory Latin lessons so much back in the 60s!).

The application has an implementation for testing.

I forgot to point to the Covington 'free-word-order parser (this is just a sketch ...) you can find in the PRoNTo toolkit (I am reporting here for completeness, but I'm sure you already know about it).

+2
source share

It looks like (using my extremely rusty memory of high school Latin) your lexical analyzer should look at each token (word) and attribute of each token with the corresponding metadata:

  • type of word (noun, verb, adjective, etc.)
  • For nouns, declension, gender, case and number
  • For verbs, conjugation, person, number, time, voice and mood.
  • For adjectives, gender, declension, number ...
  • etc .. (It has been a long time lol).

Then your analysis should be guided by metadata, as it ties everything together.

+1
source share

You can use this metaposition:

 unsorted([]) --> []. unsorted([H|T]) --> H, unsorted(T). unsorted([H|T]) --> unsorted(T), H. sentence(s(NP, VP)) --> unsorted([noun_phrase(NP), verb_phrase(VP)]). 
+1
source share

All Articles