Interview: Hash Function: Sine Function

I was asked this question. I'm not sure what the correct answer is for it (and the reasoning for the answer):

Is sin (x) a good hash function?

+7
source share
8 answers

If you mean sin() , this is not a good hash function, because:

  • it is quite predictable and for some x it is no better than just x . There should be no obvious connection between the key and the key hash.
  • it does not generate an integer value. You cannot index / index arrays with floating point indices, and there must be some kind of array in the hash table.
  • floating point is very implementation specific, and even if you create a hash function from sin() , it may not work with another compiler or other processor / computer.
  • sin() can be much slower than some simpler integer arithmetic function.
+4
source

Not really.

  • It is terribly slow.
  • In any case, you need to convert the result to some integer type to avoid the frenzy of floating point equality comparisons. (Actually not the usual accuracy problems that are endemic for FP equality comparisons and that arise from calculating two things in slightly different ways, I mean specifically problems caused by things like the fact that 387-derivative FPUs store extra bits accuracy in their registers, therefore, if a comparison is performed between two recently calculated values ​​in registers, you can get a different answer than if one of the operands was loaded into the register from memory.)
  • It is almost flat near peaks and grooves, so the quantization step (multiplication by some large number and rounding to the nearest whole) will result in many hash values ​​near the minimum and maximum, rather than an even distribution.
+3
source

Based on mathematical knowledge:

Sine (x) is periodic, so it will reach the same number from different x values, so Sine (x) will be terrible as a hash function, because you get multiple hash values ​​at the same point. There are ** infinitely many values ​​between 0 and pi for the return value, but then it passed that the values ​​would be repeated. Thus, 0 and pi and 2 * pi will all hash to the same point.

If you can make the increment small enough and have Sine (x) multiplied by say x ^ 2 or something like that, it would be mediocre at best, but then again, if you did, why not just do not use x ^ 2 in any case and throw out the periodic function all together.

** infinite: a sufficiently large number that I do not want to count.

NOTE. The sine (x) will have values ​​that are small and can be affected by the rounding error.

NOTE. Any value taken from the sine function must be multiplied by an integer, and then either changed, or the floor or ceiling, so that the value can be used as an offset of the array, etc.

+1
source

sin(x) is a trigonometric function that repeats after every 360 degrees, so it will be a bad hash function, as the hash will be repeated too often.

Simple rebuttal:

 sin(0) == sin(360) == sin(720) == sin(..) 

This is not a property of the goodhash function.

Even if you decide to use it, it is difficult to imagine the meaning returned by sin. Sin Function:

 sin x = x - x^3/3! + x^5/5! - ... 

It is impossible to imagine exactly because of the floating point precision problem, which means that for the same value, it can create two different hashes!

+1
source

One more note:

For sine (x) as a hash function - Keys in a given close range will also have hash values ​​in a close range, this is undesirable. A good hash function evenly distributes hash values ​​regardless of the nature of the keys.

+1
source

Hash values ​​should usually be integers. Since sin does not generate integers, this is not suitable.

0
source

Say we have a string s. It can be expressed as a number in hexadecimal and served in a function. If you add 2 pi, it will cease to be a valid input, since it will no longer be an integer (only non-negative integers are accepted by the function). You should find a string that gives a collision, and not just multiply the hexadecimal expression of the string with 2 pi. And adding (concatenating?) 2 pi directly to the string would not help find a collision. Maybe there is another way, but not so trivial.

0
source

I think sin (x) can make a great cryptographic hash function if used wisely. The input must be a natural number in radians and never contain pi. We must use arithmetic of arbitrary precision. For any positive integer x (radian), sin (x) is always a transcendental irrational number, and there is no other positive integer with the same sine. But there is a catch: an attacker can obtain input information by computing the hash arcsine. To prevent this, we ignore the decimal part, and some of the first digits from the fractional part, keeping only the next n (say 100) digits, making such an attack computationally impossible. It seems that a slight change in input gives a completely different result, which is a desirable property. The result of the function seems statistically random, again a good property. I am not sure how to prove that it is resistant to a collision, but I do not understand why this could not be. Also, I can't think of a way to find a specific result that results in a specific hash. I am not saying that we should blindly believe that this is of course a good crypt. hash function. I just think it seems like a good candidate to be alone. We must give him a chance and focus on proving it. And that can be very good. To those who can say that it is slow: yes, it is. And this is good for hashing passwords. Here I am attaching some perl code for this idea. It works with linux with bash and bc. (bc is a command-line calculator of arbitrary precision, included in most distributions) I will check this page for any answers, since it interests me very much. Don't be harsh, though, I'm just a CS student who wants to know more.

 use warnings; use strict; my $input='5AFF36B7';#Input for bc (as a hex number) $input='1'.$input;#put '1' in front of input, so that 0x0 , 0x00 , 0x1 , 0x01 , etc ... , #all give different nonzero results my $a=`bc -l -q <<< "scale=256;obase=16;ibase=16;s($input)"`;#call bc, keep result in $a #keep only fractional part $a=~tr/a-zA-Z0-9//cd;#Clean up string, keep only alphanumerics my @m = $a =~ /./g;#Convert string to array of chars #PRINT OUTPUT #We ignore some digits, for security reasons: #If we don't ignore any of the first digits, an attacker could gain information #about the input by computing the inverse of sin (the arcsin of the hash) #By ignoring enough of the first digits, it becomes computationally #infeasible to compute arcsin #Also, to avoid problems with roundoff error, we ignore some of the last digits for (my $c=100;$c<200;$c++){ print $m[$c]; } 
-one
source

All Articles