What is the recommended implementation for hashing OLE variants?

OLE variants used by older versions of Visual Basic and distributed in COM Automation can store many different types: basic types, such as integers and float, more complex types, such as strings and arrays, and up to IDispatch implementations and pointers as variants ByRef

Variants are also weakly typed: they convert the value to another type without warning, depending on which operator you are using and what current types have values ​​passed to the operator. For example, comparing two options, one of which contains an integer 1 and the other containing the string "1" , for equality will return True .

Therefore, assuming that I am working with options at the basic data level (for example, VARIANT in C ++ or TVarData in Delphi, i.e. in a large pool of different possible values), how should I use hash options sequentially so that they comply right rules?

Rules:

  • Options that hash unevenly compare both unequal, both in sorting and in direct equality
  • Options that are compared as equals for sorting and direct equality must have a hash as equal

This is fine if I need to use different sorting and direct comparison rules to make hashing suitable.

The way I am working now, I normalize options to strings (if they fit) and treat them like strings, otherwise I work with the variant data as if it were an opaque blob, and hashing and comparing its raw bytes. Of course, this has some limitations: the numbers 1..10 are sorted as [1, 10, 2, ... 9] , etc. It is slightly annoying, but it is consistent, and it is very small. However, I really wonder if there is an accepted practice for this problem.

+7
c ++ winapi delphi com variant
source share
3 answers

VARIANTS hash codes that are equal must be equal.

Without knowing the rules of equality and coercion that are used to test equality, it is difficult to find the right implementation.

0
source share

In your question, tension arises in the question between the use of a hash function and the stated requirements that must be checked against entering a hash. I would suggest that we have in mind several properties of hashes in general: information is lost during the hashing process, and hash collisions are expected. It is possible to build a perfect hash without collisions, but it would be problematic (or impossible?) To Build a perfect hash function if the function area is any possible OLE variant. On the other hand, if we are not talking about an ideal hash, then your first rule is violated.

I don’t know the wider context of what you are trying to accomplish, but I have to return to one of your assumptions: is the hash function you really want? Your requirements can be met in a fairly simple way if you develop a system that encodes, rather than hashes, all possible OLE Variant attributes so that they can later be called up and compared with other Variant images.

Your basic implementation of converting Variant to a string representation is moving in that direction. As you undoubtedly know, a variant may contain pointers, double pointers and arrays, so you will have to develop a consistent string representation of these data types. I doubt whether this approach can really be classified as a hash. Are you not just saving data attributes?

+2
source share

So, to collect material comparable to the first stream into a common format, line or blob.

How do you deal, for example. localization, for example. formation of reals? Real compared to a string containing the same reality that was created in a different locale will fail. Or real writing to a string with a different precision.

It seems to me that the definition of equal () is a problem, not a hash. If the "equal" values ​​can be sequentially serialized into a string (or blob) in different ways, the hashing will not be performed.

0
source share

All Articles