List as facts
Let's try to explain this with a counter example. Let me indicate nouns, verbs, etc. With simple facts:
det(the). det(a). n(woman). n(man). v(shoots).
Now we can implement the np phrase as:
np([X,Y]) :- det(X), n(Y).
In other words, we say: "a noun is a sentence with two words, the first of which is det , the second is a n ." And this will work: if we request np([a,woman]) , it will be successful, etc.
But now we need to do something else ahead, to define a verb phrase. There are two possible phrases for a verb: one with a verb and a nominal phrase that was originally defined as:
vp(X,Z):- v(X,Y),np(Y,Z).
We could define it as:
vp([X|Y]) :- v(X), np(Y).
And the one with only one verb:
vp(X,Z):- v(X,Z).
This can be converted to:
vp([X]) :- v(X).
Guessing problem
However, the problem is that both options have a different number of words: there are phrases with one word and three words. This is not a problem, but now they say - I know that this is not true. English - there is a sentence defined as vp followed by np , so this will be:
s(X,Z):- vp(X,Y), np(Y,Z).
in the original grammar.
The problem is that if we want to turn this into our new way of representing it, we need to know how much vp will consume (how many words vp will use). We cannot know this in advance: since at the moment we know little about the sentence, we cannot assume whether vp use one or three words.
We could, of course, guess the number of words with:
s([X|Y]) :- vp([X]), np(Y). s([X,Y,Z|T]) :- vp([X,Y,Z]), np(Z).
But I hope you can imagine that if you define phrases of verbs with 1, 3, 5 and 7 words, everything will be problematic. Another way to solve this problem is to leave this in Prolog:
s(S) :- append(VP,NP,S), vp(VP), np(NP).
Now Prolog guesses how to first split the sentence into two parts, and then try to match each part. But the problem is that for a sentence with n words, there are n breakpoints.
So, Prolog, for example, will first split it like this:
VP=[],NP=[shoots,the,man,the,woman]
(remember, we changed the order of the verb phrase and the nominal phrase). Obviously vp will not be very happy if it gets an empty string. So it will be easily rejected. But then he says:
VP=[shoots],NP=[the,man,the,woman]
Now vp only happy with shoots , but it will take some computational effort. np , however, is not surprised at this long part. So, Prolog returns again:
VP=[shoots,the],NP=[man,the,woman]
now vp again complains that he was given too many words. Finally, Prolog will correctly split it:
VP=[shoots,the,woman],NP=[the,woman]
The fact is that this requires a large number of guesses. And for each of these guesses, vp and np will also require work. For a real complex grammar, vp and np can break the sentence, which will lead to a huge number of attempts and errors.
The true reason is that append/3 does not have a “semantic” key on how to break a sentence, so it tries all the possibilities. However, one of them is more interested in the approach, when vp can provide information on what proportion of the proposal he really wants.
In addition, if you need to divide the sentence into 3 parts, the number of ways to do this even increases to O (n ^ 2) and so on. So guessing will not be a trick.
You can also try to generate a random verb phrase, and then hope for a phrase match:
s(S) :- vp(VP), append(VP,NP,S), np(NP).
But in this case, the number of guessed verb phrases will explode exponentially. Of course, you can give “hints”, etc., to speed up the process, but it still takes some time.
Decision
What you want to do is provide a part of the sentence to each predicate so that the predicate looks like this:
predicate(Subsentence,Remaining)
Subsentence is a list of words starting with this predicate. For example, to express a noun, it might look like [the,woman,shoots,the,man] . Each predicate consumes words that interest it: words to a certain point. In this case, the name phrase interests only ['the','woman'] , because it is a noun phrase. To complete the remaining parsing, it returns the remainder of [shoots,the,woman] in the hope that some other predicate can use the remainder of the sentence.
For our fact table, this is easy:
det([the|W],W). det([a|W],W). n([woman|W],W). n([man|W],W). v([shoots|W],W).
This means that if you request the setting: [the,woman,shoots,...] and ask det/2 if this is a determinant, he will say: “yes, the is a determinant, and the remainder is [woman,shoots,...] "is not part of the determinant, please compare it with something else.
This mapping is performed because the list is presented as a linked list. [the,woman,shoots,...] is actually represented as [the|[woman|[shoots|...]]] (so he points to the next "sublist"). If you match:
[the|[woman|[shoots|...]]] det([the|W] ,W)
It will combine [woman|[shoots|...]] with W and thus result in:
det([the|[woman|[shoots|...]],[woman|[shoots|...]]).
Thus, returning the remaining list, he thus consumed part of the .
Higher Predicates
Now, if we define the phrase:
np(X,Z):- det(X,Y), n(Y,Z).
And we call again with [the,woman,shoots,...] , he will request the union of X with this list. First, it will call det , which will consume the , without having to return. The next Y is [woman,shoots,...] , now n/2 consumes a woman and returns [shoots,...] . This is also the result returned by np , and another predicate will have to use this.
Utility
Say you enter your name as an additional noun:
n([doug,smith|W],W).
(sorry for using small cases, but otherwise Prolog sees them as variables).
He will simply try to match the first two words with doug and smith , and if that succeeds, try matching the remaining sentence. Thus, you can make a setting like: [the,doug,smith,shoots,the,woman] (sorry for this, in addition, in English, some phrase nouns are mapped to the noun directly np(X,Y) :- n(X,Y) , so the can be removed for more complex English grammar).
Guess completely eliminated?
Is guessing completely eliminated? No. It is still possible that there is overlap in consumption. For example, you can add:
n([doug,smith|W],W). n([doug|W],W).
In this case, if you request [the,doug,smith,shoots,the,woman] . First, he will consume / consume in det , then he will look for a noun that consumes from [doug,smith,...] . There are two candidates. The prologue will first try to eat only doug and match [smith,shoots,...] as a whole verb phrase, but since smith not a verb, it will return, revise only one word and, thus, dine as doug and smith as a noun instead.
The fact is that when using append, Prolog would also have to back off.
Conclusion
Using lists of differences, you can eat any number of words. The rest is returned so that other parts of the sentence, such as the verb phrase, are aimed at consuming the remainder. The list, moreover, is always completely grounded, so it definitely does not use brute force to generate all kinds of verb phrases.