A deterministic algorithm is needed to create a resource domain prefix for a given path in a uniformly distributed form

I need to create a resource domain prefix based on a given path and a configured number of resource domains in a deterministic way with good distribution. For example, if you pass it the path "/ script / site.js", it will return "res #", where "#" is an integer from 0 to the given number of resource domains.

Using C # 3.0.

So far this is:

var resourceDomainCount = 4; var hasher = System.Security.Cryptography.SHA1.Create(); var paths = new [] { "/App_Themes.0.1.433.232/images/buttonsBackgrounds.jpg", "/App_Themes.0.1.433.232/images/blahblah.jpg", "/App_Themes.0.1.433.232/images/pagebg.gif", "/App_Themes.0.1.433.232/site.css", "/script/site.js", "/App_Themes.0.1.433.232/images/different.jpg", "/App_Themes.0.1.433.232/images/shadows.jpg", "/Handlers/ImageHandler.ashx?type=l&id=123&s=g&index=0", "/Handlers/ImageHandler.ashx?type=l&id=234&s=g&index=0", "/Handlers/ImageHandler.ashx?type=l&id=345&s=g&index=0", "/Handlers/ImageHandler.ashx?type=p&id=MyProduct&s=g&index=0", "/Handlers/ImageHandler.ashx?type=p&id=WineGreat&s=g&index=0", "/Handlers/ImageHandler.ashx?type=p&id=YayYay&s=g&index=0" }; foreach (var path in paths) { var pathHash = hasher.ComputeHash(Encoding.ASCII.GetBytes(path)); var singleByteHash = pathHash.Aggregate(0, (a, b) => a ^ b); var random = new Random((int)singleByteHash); var resourceDomainIndex = random.Next(0, resourceDomainCount); (resourceDomainIndex + ": " + path).Dump(); } 

Which gives the following:

3: /App_Themes.0.1.433.232/images/buttonsBackgrounds.jpg
0: /App_Themes.0.1.433.232/images/blahblah.jpg
1: /App_Themes.0.1.433.232/images/pagebg.gif
1: /App_Themes.0.1.433.232/site.css
3: /script/site.js
1: /App_Themes.0.1.433.232/images/different.jpg
3: /App_Themes.0.1.433.232/images/shadows.jpg
1: /Handlers/ImageHandler.ashx? Type = l & id = 123 & s = g & index = 0
1: /Handlers/ImageHandler.ashx? Type = l & id = 234 & s = g & index = 0
0: /Handlers/ImageHandler.ashx? Type = l & id = 345 & s = g & index = 0
2: /Handlers/ImageHandler.ashx? Type = p & id = MyProduct & s = g & index = 0
1: /Handlers/ImageHandler.ashx? Type = p & id = WineGreat & s = g & index = 0
0: /Handlers/ImageHandler.ashx? Type = p & id = YayYay & s = g & index = 0

Do not get the distribution I want (there is only one instance of "2").

There are thousands of input lines that change all the time, and paths will be generated at runtime, for example: <a href="<%= GetPath("/script/site.js") %>">Link</a>

+4
source share
2 answers

This is much simpler than string hashing (well, apparently this is a hash already) and seems to get better distribution:

 Math.Abs(path.GetHashCode()) % resourceDomainCount 
+4
source

Your choice is far, too small. Randomness is usually similar to statistics. This appears only for a long period of time. Try reprogramming your code in 100 iterations and see if each value appears in about the same amount of time. Then try again for 10,000 iterators and note that the actual number of results is closer to percent. Then run it for 1 million iterations and note that it is even closer. It should never be exactly the same - otherwise it would be no coincidence.

And if the results do not match this pattern, you may have a problem with the distribution problem.

+1
source

All Articles