My problem:
I am looking for a way to represent the name and address of a person as a coded identifier. The identifier should contain only alphanumeric characters, be sure of a collision and be represented in the smallest number of characters. My first thought was to just use a cryptographic hash function like MD5 or SHA1, but this seems like overkill (security is not important - it doesn't have to be one way), and I would rather find something , which would create a shorter identifier. Does anyone know of an existing algorithm that is suitable for this problem?
In other words, what is the best way to implement the following function so that the return value is the same for the same input, the probability of a collision is unlikely, and the identifiers less than 20 characters?
>>> make_fake_id(fname = 'Oscar', lname = 'Grouch', stnum = '1', stname = 'Sesame', zip = '12345') N1743123734
Application context (for those interested):
This will be used for the application to record links . Given the name and address of the input, we look for a very large database for the best match and return the database identifier and other data (how we do this is not important here). If there is no match, I need to generate this identifier psuedo / generated / obtained from the search input (object name and address data). Each search entry should result in an output record with real (actual database identifier obtained from the match / link) or generated psuedo / generated / derived identifier. The psuedo identifier will have a character prefix (for example, N) to distinguish it from the real id.
, "" MD5 SHA1, , . , . , , , , , .
: , base64, -- , .
N_HASH_CHARS = 11 import hashlib, re def digest(name, address): hash = hashlib.md5(name + "|" + address).digest().encode("base64") alnum_hash = re.sub(r'[^a-zA-Z0-9]', "", hash) return alnum_hash[:N_HASH_CHARS]
? 5.95 (log (62,2)). 11 65,5 , , 2 ** 32,7 ( 7 ).
, "" ? , , , ; "AAAAA01"?
, , , , (). (, ), , , Oracle Sequence SQL Server AutoNumber ( ).
, , , , (, ..). , , ( , , ..) . .
EDIT: , , , , , ( ) . , , ( , ), " " , , , .
?
. , ? , .
, . OTOH, - :
SHA1 .
, , , , . , .
, , ( - ).
, , , , ...
, ? , , , ... - ...
You could use AAAAA01 for first person at first address, AAAAA02 for second person at first address, AAAAB07 for the seventh resident at the second adresss, etc.
If you have any way to generate and support these key entity mappings, then you need to use the full street address / Zip and fullNAme, or the hash value of the same, although the Hash value approach has a vague chance of generating duplicates ...