Why does `sxhash` return a constant for all structures?

When using the Common Lisp sxhash in structures, I get one value for all structures (only in SBCL structures of the same type). For example, the following code prints two lists of integers, all of which have the same value.

 (progn (defstruct foo data) (print (mapcar #'sxhash (loop for i below 10 collect (make-foo :data i)))) (defstruct bar data) (print (mapcar #'sxhash (loop for i below 10 collect (make-bar :data i))))) ;;; Allegro (319 319 319 319 319 319 319 319 319 319) (319 319 319 319 319 319 319 319 319 319) ;;; SBCL (22591133455133788 22591133455133788 22591133455133788 22591133455133788 22591133455133788 22591133455133788 22591133455133788 22591133455133788 22591133455133788 22591133455133788) (21321591953876048 21321591953876048 21321591953876048 21321591953876048 21321591953876048 21321591953876048 21321591953876048 21321591953876048 21321591953876048 21321591953876048) 

I tried this in both Allegro Lisp and SBCL and both return (different) constants for all structures (of the same type in SBCL). The sxhash Hyperspec linked page has the following statements:

  1. For any two objects x and y, both of which are bit vectors, characters, conses, numbers, path names, strings or characters and which (sxhash x) and (sxhash y) give the same mathematical value, even if x and y exist in different Lisp images of the same implementation. See Section 3.2.4 (Literal Objects in Compiled Files).

  2. The hash code for an object is always the same for a single session, provided that the object is not noticeably altered for an equivalence test. See Section 18.1.2 (Modifying a Hash Key Table).

The last statement does not indicate, but, apparently, implies that it would be reasonable for two structures that do not have equal to have different hash codes (modulo collision). However, the structures are suspiciously missing from the list in the first paragraph. At first I did this with an error in Allegro Lisp, but now that I see it in two different implementations, I think I donโ€™t understand something about the specification.

+6
source share
2 answers

I requested Franz support, and that was their answer. Presumably, SBCL is doing something similar for similar reasons.

The cl: sxhash function always returns the same value for structure objects. The reason for this is because it does not have extra space to store a unique hash code inside it. As a result, using structures as keys is very inefficient. The excl :: hash-table-stats function demonstrates this when defining a hash table with structures used as keys; the histogram becomes the worst because every key wants the same index.

It was decided to keep the same behavior for the objects of the structure, because automatically enabling a hash slot in the entire structure of the objects would make all structures on average one word longer. For small structures, this is unacceptable to many of our users.

Instead, the user can define a structure with an additional slot, and the constructor for this type of structure can store a unique value in this slot (either a random value or a value obtained by increasing the counter each time the constructor starts). Also create a hash generating function that accesses this hash slot to generate value. If the structures to be hashed are buried inside the list, then this hash function must know how to traverse these keys before getting a unique value. Finally, create a hash table using documented: the hash-function argument to make-hash-table (still using the equal test argument) to create a hash table that will be well distributed.

Alternatively, and if you can guarantee that none of the slots in your structure will be changed after they are used as keys in the hash table, you can use the equalp testing function in your make-hash-table, and not equal. If you do that these structure objects do not change, because then they may not be found in the hash table.

+6
source

There are two important properties of sxhash : if (equal xy) then (= (sxhash x) (sxhash y)) , and the value returned by sxhash is the same for any object (even between lisp images).

Now the structures are equal , if only they have eq (i.e. they have the same address), but sxhash cannot just return the address (or some hash) of the structure, because the address can change due to garbage collection. When designing an lisp implementation, you must choose whether to have sxhash the same for each structure or to store some credentials in each structure that does not change when the garbage collector moves the structure and therefore can be used for an sxhash object. Most implementations (including Franz and sbcl) believe that adding such a value is a waste of space or useless if it is provided with only a few spare bits.

This compromise will ultimately only affect the user's attempt to implement hash tables, as their own hash tables of implementations can use the address of the objects and tell the garbage collector so that they can rephrase when moving the object (I donโ€™t know, or no implementations do this) . Some implementations (including sbcl) allow you to configure built-in hash tables using your own comparison / hash operations. Perhaps if you implemented hashing, you could add additional fields to them.

I believe that the result returned by sxhash in sbcl is determined by hashing the name of the structure type.

+1
source

All Articles