Regular expression

Question

Regular expression

I am trying to execute a regex. I have all the functions that are written out, but they do not work as they should. From what I can say, this is a problem when I try to compare a list.
For example, "re_contains (a, a)". gives true (obviously), like "re_contains (union (a, b), a)".

But as soon as I make a list, it fails. "re_contains (seq (a, b), [a, b])." returns false. When adding, you must go through all possible combinations to find a match, but none of these functions work correctly. It makes me realize that maybe I am missing a base case. But I think that "re_contains (X, L): - X == L." must take care of this. I must be watching something important here.

Here is my code:

re_contains(empty, []). re_contains(X, L) :- X == L. re_contains(seq(X, Y), L) :- append(L1, L2, L), re_contains(X, L1), re_contains(Y, L2). re_contains(union(X, _), L) :- re_contains(X, L). re_contains(union(_, Y), L) :- re_contains(Y, L). re_contains(kleene(X), L) :- append([Car|L1], L2, L), re_contains(X, [Car|L1]), re_contains(kleene(X), L2). re_contains(kleene(_),[]).

+6

regex prolog dcg failure-slice

Ryan C. Stacy 25 sept. '12 at 2:27

source share

2 answers

There are a few questions. Here are the most obvious:

Typing. Your re_contains/2 predicate expects a list as the second argument. That re_contains(a,a). succeeds, rather coincidence than intention.

Monotone. Another problem is that re_contains([a],[a]) succeeds, but re_contains([X],[a]) fails. Or, to see it from a different angle: re_contains([X],[a]) fails, but X = a, re_contains([X],[a]) succeeds. By adding the target X = a , we specialized the request, so the original failed request should complete again.

Testing for identification ( ==/2 ) should be replaced by equality ( =/2 ) and so that we have some list. So:

  re_contains (X, L): -
     % X == L.
     X = L
     append (X, _, _).

Note that append/3 used here only to make X a list - the actual add function is not used.

Efficiency. The third issue concerns the actual way of representing concatenation. Let's just take a look at the following rule:

  re_contains (seq (X, Y), L): -
     append (L1, L2, L),
     re_contains (X, L1),
     re_contains (Y, L2).

Now suppose we have a list of length N How many answers are possible for the append(L1, L2, L) target append(L1, L2, L) ? Actually N + 1 . And that, regardless of the actual values. Now consider:

  ? - length (L, 1000000), time (re_contains (seq ([a], []), [b | L])).
 % 2,000,005 inferences, 0.886 CPU in 0.890 seconds (100% CPU, 2258604 Lips)
 false

re_contains/2 takes time proportional to the length of the list here. But it would be enough to look at the first element to understand that this is impossible.

So the problem is using append/3 . There is a simple rule for Prolog: if you use too much append/3 , consider using dcg - Defined Class Grammar. Please check the tag for more details - and consult the Prolog introduction. To give you a start, here is a subset of your definition:

  re_contains (RE, L): -
     phrase (re (RE), L).

 re ([]) -> [].
 re ([E]) -> [E].
 re (seq (X, Y)) ->
     re (X),
     re (Y).

Which no longer explores the entire list:

  ? - length (L, 1000000), time (phrase (re (seq ([a], [])), [b | L])).
 % 6 inferences, 0.000 CPU in 0.000 seconds (88% CPU, 127313 Lips)
 false

BTW, here is the full definition.

Termination / Failure. Efficiency related - this is a property of completion. Ideally, the request ends if the set of solutions can be finally presented. That is a finite number of answers. Well, this is the ideal we are striving for. The prologue is a simple but very efficient execution algorithm, sometimes there will be a cycle when a finite number of answers would be possible. Understanding the very reason for not terminating is sometimes very difficult. The usual debugging strategies - like tracing or navigating with the debugger - do not work, as they show you too much detail. Fortunately, there is a better technique:

Instead of looking at your entire program, I will again look only at its very small fragment. This fragment is a failure fragment (see the Link for more details). It is much smaller, but talks a lot about the original program - provided that it was a clean, monotonous program:

If the slice of failure does not end, the original program does not end.

So, if we find such a fragment of failure, we can immediately draw conclusions about the entire program. Without even reading the rest!

Here is an interesting piece of failure:

  ...
 re_contains (X, L): - false,
     X = L
 re_contains (seq (X, Y), L): -
     append (L1, L2, L), false,
     re_contains (X, L1) ,
     re_contains (Y, L2) .
 re_contains (union (X, _), L): - false,
     re_contains (X, L) .
 ...

Now consider the target re_contains(seq([],[]),L). Ideally, there should be exactly one answer L = [] , and the goal should be completed. However, it loops since append(L1, L2, L) does not end. Compare this with the DCG solution, above which ends with phrase(re(seq([],[])),L) .

+8

false 25 sept. '12 at 7:54

source share

Capellic · Accepted Answer · 2012-09-25T07:28:44+0000

append/3 will split L , and both L1 and L2 will be lists.

I would try replacing re_contains(X, L) :- X == L. with re_contains(X, [X]).

After changing re_contains(a,a). will fail.

You present the sequence in different ways, and your assistant does not provide both of them. In fact, the only cases of "working" are not sequences.

Regular expression

More articles: