String to Number and back algorithm

It's hard (for me) I hope people can help me. I have text, and I need to transfer it to a number, but it must be unique, since the text is unique.

For example: The word "kitten" could produce 12432, but only the word "kitten" produces this number. The text can be any, and the correct number must be indicated.

One problem: the result integer must have an unsigned 32-bit integer, which means that the largest possible number is 2147483647. I do not mind if there is a text length limit, but I hope it can be as large as possible.


My attempts. You have the letters AZ and 0-9, so one character can have a number from 1-36. But if A = 1 and B = 2, and the text is A (1) B (2), and you add it, you will get the result 3, the problem is that the BA text gives the same result, so this algorithm will not work.

Any ideas point me in the right direction or is it impossible to do?

+4
source share
5 answers

Your idea, as a rule, is sane, you only need to develop a little.

Let f(c) be the function that converts the character c to a unique number in the range [0..M-1] . Then you can calculate the result number for the whole row like this.

 f(s[0]) + f(s[1])*M + f(s[2])*M^2 + ... + f(s[n])*M^n 

You can easily prove that the number will be unique to a particular row (and you can get the row back from a number).

Obviously, you cannot use very long lines (up to 6 characters for your case), since 36^n is growing fast.

+5
source

Create a dictionary from words mapped to unique numbers, and use this what you can best do.

I doubt that there are more than 2 ^ 32 numbers of words used, but this is not the problem you are facing, the problem is that you need to map the numbers back to words.

If you only match words with numbers, some hashing algorithm may come up, although you have to work a bit to ensure that you have one that will not create collisions.

However, for numbers turned to words, this is a completely different problem, and the simplest solution for this is to simply create a dictionary and display both methods.

In other words:

 AARDUANI = 0 AARDVARK = 1 ... 

If you want to match numbers with 26 base characters, you can save only 6 characters (or 5 or 7 if I miscalculated), but not 12 and, of course, not 20.

Unless you count only actual words and they do not follow any good counting rules. The only way to do this is to simply put all the words in a long list and start assigning numbers from the very beginning.

0
source

Imagine that you tried to save lines from the character set "0-9" only in number (which is equivalent to getting a series of lines). What would you do?

 Char 9 8 7 6 5 4 3 2 1 0 Str 0 5 2 1 2 5 4 1 2 6 Num = 6 * 10^0 + 2 * 10^1 + 1 * 10^2... 

Apply the same thing to your characters.

 Char 5 4 3 2 1 0 Str ABCDEF L = 36 C(I): transforms character to number: C(0)=0, C(A)=10, C(B)=11, ... Num = C(F) * L ^ 0 + C(E) * L ^ 1 + ... 
0
source

If he correctly wrote the text in any language, you can have a number for each word. However, you will need to consider all possible plurals, names of places and people, etc., which is usually not possible. What text are we talking about? Usually there will be some existing words that cannot be encoded in 32 bits in any way without prior notification of them.

Can you make a list of words along the way? Just give the first word that you see number 1, the second number 2, and check if the word is already on the list or it needs a new one. Then save the newly created dictionary. This is likely to be the only workable solution if you need a 100% reliable reversible mapping from numbers back to original words, given the new unknown text that does not follow any known pattern.

With 64 bits and a good enough hash like MD5, it is highly unlikely that it will have collisions, but for 32 bits it does not seem likely that a safe hash exists.

0
source

Just treat each character as a digit in base 36 and calculate the decimal equivalent?

So:

 'A' = 0 'B' = 1 [...] 'Z' = 25 '0' = 26 [...] '9' = 35 'AA' = 36 'AB' = 37 [...] 'CAB' = 46657 
0
source

All Articles