I wrote a Ruby tool that does what you need. (Sorry it's not Java, but my Java is over ten years ago. But the general idea should be carried over to Java without the hassle.)
In short, a given input string (7 bytes) is compressed into a number between 0 and 175_760_000 (28 bits). A 32-bit random number is added, making a 60-bit integer, which is encoded into a 10-character output string.
My previous math was wrong; since your input is only 28 bits and your desired result is 60 bits, which leaves 32 bits to add about two billion random permutations. I was mistaken in performing my calculations.
#!/usr/bin/ruby -w START_A = "A"[0] START_a = "a"[0] START_0 = "0"[0] CODEMASK = ((1 << 28) - 1) # turn on lower 28 bits RANDOMMASK = ((1 << 60) - 1) & ~ CODEMASK # turn on upper 32 bits def compress(line) a, b, c, h, i, j, k = line.chomp().each_char().entries() a, b, c = a[0], b[0], c[0] small_a, small_b, small_c = a - START_A, b - START_A, c - START_A letters = (small_a * 26**2) + (small_b * 26) + small_c h, i, j, k = Integer(h), Integer(i), Integer(j), Integer(k) number = letters * 10_000 + h*1000 + i*100 + j*10 + k end def uncompress(number) k = number % 10 number /= 10 j = number % 10 number /= 10 i = number % 10 number /= 10 h = number % 10 number /= 10 small_c = number % 26 number /= 26 small_b = number % 26 number /= 26 small_a = number % 26 number /= 26 if (number != 0) raise "input wasn't generated with compress()" end a, b, c = small_a + START_A, small_b + START_A, small_c + START_A [a.chr(), b.chr(), c.chr(), h.to_s(), i.to_s(), j.to_s(), k.to_s()].join() end def add_random(number) (rand(2**31) << 28) + number end def remove_random(number) [number & CODEMASK, number & RANDOMMASK] end def to_output(number) a = [] begin a << transform_out(number % 62) number /= 62 end while(number > 0) a.reverse().join() end def from_output(string) num = 0 string.each_char() do |c| num *= 62 num += transform_in(c) end num end def transform_out(small) if (small < 0 || small > 61) raise "small should be between 0 and 61, inclusive" end if (small < 26) out = START_A+small elsif (small < 52) out = START_a+(small-26) else out = START_0+(small-52) end out.chr() end def transform_in(char) if (/^[A-Za-z0-9]$/ !~ char) raise "char should be AZ, az, or 0-9, inclusive" end num = char[0] out = case num when START_A .. START_A+26 then num - START_A when START_a .. START_a+26 then (num - START_a) + 26 when START_0 .. START_0+10 then (num - START_0) + 52 end out end begin while(line = DATA.readline()) do line.chomp!() c = compress(line) a = add_random(c) output = to_output(a) input = from_output(output) new_c, r = remove_random(input) u = uncompress(new_c) printf("original input: %s\n compressed: %d\n after adding random amount: %d\n *output: %s\n *input: %s\n random amount added: %d\n after removing random amount: %d\nuncompressed: %s\n", line, c, a, output, input, r, new_c, u) end rescue EOFError => e end __END__ AAA0000 SIN1500 ABD2123 SMS3345 ZZZ9999
Running the program with the five inputs given below leads to this conclusion:
$ ./compress.rb original input: AAA0000 compressed: 0 after adding random amount: 508360097408221184 *output: liSQkzXL1G *input: 508360097408221184 random amount added: 508360097408221184 after removing random amount: 0 uncompressed: AAA0000 original input: SIN1500 compressed: 123891500 after adding random amount: 421470683267231532 *output: fIVFtX9at2 *input: 421470683267231532 random amount added: 421470683143340032 after removing random amount: 123891500 uncompressed: SIN1500 original input: ABD2123 compressed: 292123 after adding random amount: 414507907112269083 *output: emb6JfDhUH *input: 414507907112269083 random amount added: 414507907111976960 after removing random amount: 292123 uncompressed: ABD2123 original input: SMS3345 compressed: 124983345 after adding random amount: 383242064398325809 *output: cTPpccLAvn *input: 383242064398325809 random amount added: 383242064273342464 after removing random amount: 124983345 uncompressed: SMS3345 original input: ZZZ9999 compressed: 175759999 after adding random amount: 27149937199932031 *output: CAVf14tiRh *input: 27149937199932031 random amount added: 27149937024172032 after removing random amount: 175759999 uncompressed: ZZZ9999
The lines you are really looking for are original input , *output and uncompressed . Your client has the original input lines, after using to_output(add_random(compress(input))) you will get ten-digit A-Za-z0-9 output in the *output line. You pass it on to users and it is a magic token. Then, when it's time to test them, you use remove_random(from_output(user_string)) to detect both the random value added to the string and the integer that you can pass uncompress() .
One important note : Input AAA0000 is stored in clear text in the lower 28 bits. A random number is stored in clear text in the upper 32 bits. This is just an obfuscation of the original inputs, for someone it would be difficult to detect a pattern if they have two inputs and outputs. Heck, they could even correctly guess the algorithm giving only one input and output.
If you need this to be cryptographically strong, you still have some work ahead :), but it can be as simple as XORing an intermediate 60-bit number with an rc4 username or something like that .
Brief explanation
Your input lines can be interpreted as integers, but with a change : the first three digits are in base 26 , the next four digits are in base 10. The number will be less than 175_760_000 . To unambiguously store numbers between 0 and 175_760_000 , bit 28 is required. Your output lines are also a ten-digit number with each digit in the base 62. (Think base64 , but without - , / or = (to fill).) 62 possible values and ten positions give you the maximum value of 839_299_365_868_340_224 , which can be represented in bits 60 .
The input string only accepts 28 bits, your output string has 60 bits available, and it leaves 32 bits available for storing a randomly generated number. If we multiply the 32 bit number by 2^28 (the same as the left shift by 28: 1 << 28 ), then the least significant bits of 28 will be free for the original input number.
Once we have calculated the number of bits 60 , we output it to our database 62 for human consumption.
To reverse the process, you decode base number 62 to get bit 60 ; split the number of bits 60 into the lower input number 28 and the upper 32 bit random number, and then output the number of bits 28 in the original format: three basic 26 digits, followed by four basic 10 digits.
FINAL UPDATE
Yusuf, great job converting my Ruby to Java. I am very impressed, especially considering how good your version of Java is: your version is more picky. I am jealous and impressed. :)
I found two small errors that remained in your program: RANDOMMASK was accidentally initialized to 0 , because Java does not advance all operands to the final data type when executing arithmetic shift operators. Changing 1 to 1L causes the result 1L << 60 be long ; without L result 1 << 60 is int and not large enough to hold the full number.
In addition, the numbers were not properly compressed; my Ruby code parsed characters as an integer, and your Java code interpreted characters as an integer. (Yours used the value of the character, mine converted the character to an integer based on the ascii value of the character. Mine actually did not parse, because it just does the subtraction, but if it were a string, String.toInteger(character) the same, so that is very similar to parsing.)
But your uncompress logic was correct, and due to a mismatch, the result was incorrect. So I changed my code to parse digits into integers (and changed from char to int to disable the meaningless warning).
Here's the difference in what I had to change in your program for it to work:
--- Compress.java.orig 2011-03-25 16:57:47.000000000 -0700 +++ Compress.java 2011-03-25 17:09:42.000000000 -0700 @@ -1,12 +1,12 @@ -import java.util.* +import java.util.*; public class Compress { static char START_A = "A".charAt(0); static char START_a = "a".charAt(0); static char START_0 = "0".charAt(0); -static int CODEMASK = ((1 << 28) - 1);
And now the full source, just in case, which is easier :)
import java.util.*; public class Compress { static char START_A = "A".charAt(0); static char START_a = "a".charAt(0); static char START_0 = "0".charAt(0); static long CODEMASK = ((1 << 28) - 1); //turn on lower 28 bits static long RANDOMMASK = ((1L << 60) - 1) & ~ CODEMASK; //turn on upper 32 bits public static void main(String[] args) { String[] test = new String[]{ //"AAA0000", "SIN1500", "ABD2123", "SMS3345", "ZZZ9999", //"ABD2123", "ABD2123", "ABD2123", "ABD2123", "ABD2123" "ABD2123" }; for(String t : test){ long c = compress(t); long a = add_random(c); String output = to_output(a); long input = from_output(output); String[] new_c_r = remove_random(input); String u = uncompress(Long.valueOf(new_c_r[0])); System.out.println("Original input: " + t); System.out.println(" compressed: " + c); System.out.println(" after adding random amount: " + a); System.out.println(" *output: " + output); System.out.println(" *input: " + input); System.out.println(" random amount added: " + new_c_r[1]); System.out.println(" after removing random amount: " + new_c_r[0]); System.out.println(" uncompressed: " + u); System.out.println("-----------------------------------------------------------------"); } } public static long compress(String line){ //7 character char a = line.charAt(0); char b = line.charAt(1); char c = line.charAt(2); int h = line.charAt(3) - START_0; int i = line.charAt(4) - START_0; int j = line.charAt(5) - START_0; int k = line.charAt(6) - START_0; long small_a = (long) a - START_A; long small_b = (long) b - START_A; long small_c = (long) c - START_A; long letters = (small_a * 26 * 26) + (small_b * 26) + small_c; long numbers = letters * 10000 + h * 1000 + i*100 + j*10 + k; return numbers; } public static String uncompress(long number){ long k = number % 10; number /= 10; long j = number % 10; number /= 10; long i = number % 10; number /= 10; long h = number % 10; number /= 10; long small_c = number % 26; number /= 26; long small_b = number % 26; number /= 26; long small_a = number % 26; number /= 26; if (number != 0) throw new RuntimeException("input wasn't generated with compress()"); long a = small_a + START_A; long b = small_b + START_A; long c = small_c + START_A; StringBuffer result = new StringBuffer(); result.append((char) a).append((char) b).append((char) c).append(h).append(i).append(j).append(k); return result.toString(); } public static long add_random(long number){ return (((long) (Math.random()* Math.pow(2, 31))) << 28) + number; } public static String[] remove_random(long number){ return new String[]{String.valueOf(number & CODEMASK), String.valueOf(number & RANDOMMASK)}; } public static String to_output(long number){ List<Character> a = new ArrayList<Character>(); do{ a.add(transform_out(number % 62)); number /= 62; }while(number > 0); Collections.reverse(a); StringBuffer result = new StringBuffer(); for(int i=0; i<a.size(); i++){ Character s = (Character) a.get(i); result.append(s); } return result.toString(); } public static long from_output(String string){ long num = 0; for(char c : string.toCharArray()){ num *= 62; num += transform_in(c); } return num; } public static char transform_out(long small){ long out; if (small < 0 || small > 61){ throw new RuntimeException("small should be between 0 and 61, inclusive"); } if(small < 26){ out = START_A + small; }else if(small < 52){ out = START_a + (small-26); }else{ out = START_0 + (small-52); } return (char) out; } public static long transform_in(char c){ if(!String.valueOf(c).matches("[a-zA-Z0-9]")){ throw new RuntimeException("char should be AZ, az, or 0-9, inclusive"); } long num = (long) c; long out; if(num >= START_A && num <= START_A+26) out = num-START_A; else if(num >= START_a && num <= START_a+26) out = (num-START_a) + 26; else if(num >= START_0 && num <= START_0+10) out = (num-START_0) + 52; else throw new RuntimeException("Salah, bego!"); return out; }}