Common Lisp: What is the disadvantage of using this filter function in very large lists?

I want to filter out all the elements of list 'a from list' b and return the filtered 'b. This is my function:

(defun filter (ab) "Filters out all items in a from b" (if (= 0 (length a)) b (filter (remove (first a) a) (remove (first a) b)))) 

I am new to lisp and don’t know how β€œdelete does its thing, what time will this filter take?”

+4
source share
4 answers

There are two ways to find out:

  • you can check it using data

  • you could analyze your source code

Check out the source code.

  • lists are built from linked cons cells

    Length
  • must go through the list once

  • for EVERY recursive FILTER call, you compute the length a. Bad!

    (Use ENDP instead.)

  • REMOVE need to go through the list once

  • for each recursive call you compute DELETE twice: BAD!

    (Instead of using REMOVE on a , overwrite with REST.)

  • a FILTER call will not necessarily be an optimized tail call. In some implementations, this may, in some you need to tell the compiler what you want to optimize for tail calls, in some implementations there is no tail call optimization. If not, then you get an overflow stack on fairly long lists.

    (Use loop constructs such as DO, DOLIST, DOTIMES, LOOP, REDUCE, MAPC, MAPL, MAPCAR, MAPLIST, MAPCAN or MAPCON instead of recursion, if applicable.)

Summary: this naive code with low performance.

Generic Lisp provides this built-in function: SET-DIFFERENCE should do what you want.

http://www.lispworks.com/documentation/HyperSpec/Body/f_set_di.htm#set-difference

+9
source

Generic Lisp does not support tail call optimization (according to the standard), and you can just run out of memory with a terrible call stack (depending on implementation).

+1
source

I would not write this function because, as Rainer Hoswig says , the standard already provides SET-DIFFERENCE . However, if I had to implement a function implementation, I would use this:

 (defun filter (ab) (let ((table (make-hash-table))) (map 'nil (lambda (e) (setf (gethash e table) t)) a) (remove-if (lambda (e) (gethash e table)) b))) 

The implementation of this method provides several advantages, the most important thing is that it only bypasses b once; using a hash table to keep track of which elements are in a is likely to be much better if a longer.

In addition, the use of common sequence functions, such as MAP and REMOVE-IF , means that this function can be used with strings and vectors, as well as with lists, which is an advantage even over the standard SET-DIFFERENCE . The main drawback of this approach is that you want to extend the function with an argument :TEST , which allows the user to provide an equality predicate other than the default EQL , since CL hash tables only work with a small number of pre-defined equality predicates ( EQ , EQL , EQUAL and EQUALP ).

+1
source
 (defun filter (ab) "Filters out all items in a from b" (if (not (consp a)) b (filter (rest a) (rest b)))) 
+1
source

All Articles