CRC32 make short URL for website

I am trying to understand crc32 to create a unique url for a webpage.

If we use crc32, can we use the maximum number of URLs to avoid duplication?

What could be the approximate string length so that the checksum is 2 ^ 32?

When I tried the UUID for the url and converted the uuid bytes to base 64, I could shorten to 22 characters. Interestingly, I can reduce it further.

Basically I want to convert url (maximum 1024 characters) to shorted id.

+4
source share
6 answers

There is no number that "the maximum number of URLs can be used so that we can avoid duplicates" for CRC32.

The problem is that CRC32 can create duplicates, and this is not a function of the number of values ​​that you throw at it, it is a function of how these values ​​look.

This way you may run into a second url if you're out of luck.

You should not base your algorithm on creating a unique hash; instead, create a unique value for each URL manually.

+6
source

If you already store the full URL in the database table, the integer identifier is quite short and can be shortened by converting it to the database 16, 64 or 85. If you can use the UUID, you can use an integer, and you can also, so how short it is, and I don’t see what benefit the UUID will provide in your lookup table.

+4
source

CRC32 means a 32-bit cyclic redundancy check, where any arbitrary number of bits is added up to a 32-bit checksum. And the sum checking functions are surjective, which means that multiple input values ​​have the same output value. Thus, you cannot invert the function.

+1
source

The proper way to create a short url is to store the complete file in the database and post something that maps to the row index. A compact way is to use a Base64 string identifier, for example. Or you can use the UID for the primary key and show this.

Do not use the checksum because it is too small and very likely for conflict. A cryptographic hash is more and less likely, but it still doesn't work.

+1
source

No, even you use md5 or any other checksum, the URL MAY be a duplicate, it all depends on your luck.

Thus, do not create a unique url base for this checksum

0
source

The fastest (and possibly best!) Way to solve things might be to simply use the hash of the local path and request of the given URI as follows:

using System; namespace HashSample { class Program { static void Main(string[] args) { Uri uri = new Uri( "http://host.com/folder/file.jpg?code=ABC123"); string hash = GetPathAndQueryHash(uri); Console.WriteLine(hash); } public static string GetPathAndQueryHash(Uri uri) { return uri.PathAndQuery.GetHashCode().ToString(); } } } 

The above assumes that the URI scheme and host remain unchanged. If GetHashCode will not work with any string.

For a great discussion on CRC32 Hash Collision, visit: http://episteme.arstechnica.com/eve/forums/a/tpc/f/6330927813/m/821008399831

-1
source

All Articles