The shortest hash? MD5 / SHA. First characters git

I need a hash function. Users will write these hashes to the computer, so the hash should be short. I will have about 50 million records in the database. Everyone should have their own hash. I would like to have unique hashes. But if small entries have the same hash, I can agree. Uniquely better.

MD2 is safe for me, but hash is long: "8350e5a3e24c153df2275c9f80692773" - 32 characters. If you have to write 10 MD2 hashes on the keyboard, you are not happy ...

Git use SHA1 for each commit (40 characters). But the output shows only the first 7 characters:

$ git log commit e2cfc89fae5b43594b2c649fd4c05bcc6d2d12ac ... commit 56a8b4c50d4269dc3f88727472933fd81231f63b ... commit ce2e9ddbe896b9592abbd5fcb6604b181809d523 ... commit 498c49833516ea33b6a40697634ea6e3cfd62328 ... commit b7d78aea415e64d8d441f9747fe6d5d48fe54ee5 $ git log --oneline | head -n 5 e2cfc89 commnit message... 56a8b4c commnit message... ce2e9dd commnit message... 498c498 commnit message... b7d78ae commnit message... 

How is it safe / unique? If I use, for example, the first 5 or 10 characters from MD5 / SHA-1 / SHA-256, is it safe enough?

Thanks.

+6
source share
2 answers

Check out hashids , which is designed to create unique YouTube-style hashes from your primary keys (or another set of unique numbers). This is not really a hash in the sense of MD5 and SHA-1, as it is intended for reversibility.

As an example, if you want to "hash" your only integer primary key, you can get relationships like

 (PK: 1) <=> (hashid: 8dY0qQ) 

This is sown from a secret value that you control, so users cannot determine the primary key that they really refer to. If your database is somewhat more active, say with a few shards and complex keys, you're still fine. hashids takes a list of integers as input:

 (3, 171, 24) <=> (243j7Z) 

As a developer, you are responsible for determining the minimum hash length. As you create more hashes, hashids can generate slightly longer hashes.

Hashes are guaranteed to be unique to this input (seed, minimum hash length and a list of integers for the hash):

no collisions . Created hashes must be unique.

There is support

  • Javascript
  • ruby
  • Python
  • Java
  • Php
  • Perl
  • CoffeeScript
  • Objective-c
  • Go
  • Lua
  • Node.js
  • .NET
+6
source

By default, git only displays 7 characters, since the odds will be unique, and you can reference commits / blobs using enough characters to define it as unique.

However, under the hood, he still uses the full hash. If your git tree has two commits with the same first 7 numbers, then it throws an error if you use only 7 characters to identify one of these commits.

If the user enters a hash for data that the system already knows, then allow the user to enter as many characters as he thinks, and if this is not enough to unambiguously determine which hash he is talking about, then an error and a request for more.

7 hexadecimal characters give ~ 2x10 ^ 7 possible hashes. Assuming you use a good hash - i.e. It has a uniform distribution over values, and then with a square approximation, you have a 50% chance of repeating after ~ 19 k * hashes. Whether this is acceptable to you depends on how much you insert

* The number of inserts to get a 50% chance of a hash collision for a hash of N hexadecimal characters is approximately 0.5+sqrt(0.25-(2xln(0.5)x16^N))

+5
source

All Articles