There are a few questions. Here are the most obvious:
Typing. Your re_contains/2 predicate expects a list as the second argument. That re_contains(a,a). succeeds, rather coincidence than intention.
Monotone. Another problem is that re_contains([a],[a]) succeeds, but re_contains([X],[a]) fails. Or, to see it from a different angle: re_contains([X],[a]) fails, but X = a, re_contains([X],[a]) succeeds. By adding the target X = a , we specialized the request, so the original failed request should complete again.
Testing for identification ( ==/2 ) should be replaced by equality ( =/2 ) and so that we have some list. So:
re_contains (X, L): -
% X == L.
X = L
append (X, _, _).
Note that append/3 used here only to make X a list - the actual add function is not used.
Efficiency. The third issue concerns the actual way of representing concatenation. Let's just take a look at the following rule:
re_contains (seq (X, Y), L): -
append (L1, L2, L),
re_contains (X, L1),
re_contains (Y, L2).
Now suppose we have a list of length N How many answers are possible for the append(L1, L2, L) target append(L1, L2, L) ? Actually N + 1 . And that, regardless of the actual values. Now consider:
? - length (L, 1000000), time (re_contains (seq ([a], []), [b | L])).
% 2,000,005 inferences, 0.886 CPU in 0.890 seconds (100% CPU, 2258604 Lips)
false
re_contains/2 takes time proportional to the length of the list here. But it would be enough to look at the first element to understand that this is impossible.
So the problem is using append/3 . There is a simple rule for Prolog: if you use too much append/3 , consider using dcg - Defined Class Grammar. Please check the tag for more details - and consult the Prolog introduction. To give you a start, here is a subset of your definition:
re_contains (RE, L): -
phrase (re (RE), L).
re ([]) -> [].
re ([E]) -> [E].
re (seq (X, Y)) ->
re (X),
re (Y).
Which no longer explores the entire list:
? - length (L, 1000000), time (phrase (re (seq ([a], [])), [b | L])).
% 6 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 127313 Lips)
false
BTW, here is the full definition.
Termination / Failure. Efficiency related - this is a property of completion. Ideally, the request ends if the set of solutions can be finally presented. That is a finite number of answers. Well, this is the ideal we are striving for. The prologue is a simple but very efficient execution algorithm, sometimes there will be a cycle when a finite number of answers would be possible. Understanding the very reason for not terminating is sometimes very difficult. The usual debugging strategies - like tracing or navigating with the debugger - do not work, as they show you too much detail. Fortunately, there is a better technique:
Instead of looking at your entire program, I will again look only at its very small fragment. This fragment is a failure fragment (see the Link for more details). It is much smaller, but talks a lot about the original program - provided that it was a clean, monotonous program:
If the slice of failure does not end, the original program does not end.
So, if we find such a fragment of failure, we can immediately draw conclusions about the entire program. Without even reading the rest!
Here is an interesting piece of failure:
...
re_contains (X, L): - false,
X = L
re_contains (seq (X, Y), L): -
append (L1, L2, L), false,
re_contains (X, L1) ,
re_contains (Y, L2) .
re_contains (union (X, _), L): - false,
re_contains (X, L) .
...
Now consider the target re_contains(seq([],[]),L). Ideally, there should be exactly one answer L = [] , and the goal should be completed. However, it loops since append(L1, L2, L) does not end. Compare this with the DCG solution, above which ends with phrase(re(seq([],[])),L) .