Firefox cache cache generation algorithm error

Question

Firefox cache cache generation algorithm error

There is a bug in Firefox (even in new beta versions and in minefield releases) that prevents certain files from being cached due to the algorithm for creating a key in the hash cache. Here is a link to the function source code .

I want all the files on my site to be cached. However, I do not understand why their hash function does not create unique keys for different URLs. I hope someone can describe this malfunction in psuedo-code or java.

It would be nice to create a utility for developers to provide unique URLs until this bug is fixed.

EDIT:. There were some very useful answers, however I need additional step-by-step help to create a utility for checking these cache mixes. It would be great to get Java code that can play the keys that firefox creates. Therefore, the discovery of generosity on this issue.

EDIT 2: Here is a partially working Java port (written using processing ). Check out the tests below; the first three work as expected, while the others do not. I suspect something regarding signed / unsigned ints. Suggestions?

// // the bad collision function // http://mxr.mozilla.org/mozilla/source/netwerk/cache/src/nsDiskCacheDevice.cpp#240 // //248 PLDHashNumber //249 nsDiskCache::Hash(const char * key) //250 { //251 PLDHashNumber h = 0; //252 for (const PRUint8* s = (PRUint8*) key; *s != '\0'; ++s) //253 h = PR_ROTATE_LEFT32(h, 4) ^ *s; //254 return (h == 0 ? ULONG_MAX : h); //255 } // // a java port... // String getHash( String url ) { //get the char array for the url string char[] cs = getCharArray( url ); int h = 0; //for (const PRUint8* s = (PRUint8*) key; *s != '\0'; ++s) for ( int i=0; i < cs.length; i++ ) { h = PR_ROTATE_LEFT32(h, 4) ^ cs[i]; } //looks like the examples above return something in hex. //if we get matching ints, that is ok by me. //but for fun, lets try to hex the return vals? String hexVal = hex( h ); return hexVal; } char[] getCharArray( String s ) { char[] cs = new char[s.length()]; for (int i=0; i<s.length(); i++) { char c = s.charAt(i); cs[i] = c; } return cs; } // // how to PR_ROTATE_LEFT32 // //110 /* //111 ** Macros for rotate left and right. The argument 'a' must be an unsigned //112 ** 32-bit integer type such as PRUint32. //113 ** //114 ** There is no rotate operation in the C Language, so the construct //115 ** (a << 4) | (a >> 28) is frequently used instead. Most compilers convert //116 ** this to a rotate instruction, but MSVC doesn't without a little help. //117 ** To get MSVC to generate a rotate instruction, we have to use the _rotl //118 ** or _rotr intrinsic and use a pragma to make it inline. //119 ** //120 ** Note: MSVC in VS2005 will do an inline rotate instruction on the above //121 ** construct. //122 */ //... //128 #define PR_ROTATE_LEFT32(a, bits) _rotl(a, bits) //return an int (32 bit). what do we do with the 'bits' parameter? ignore? int PR_ROTATE_LEFT32( int a, int bits ) { return (a << 4) | (a >> (32-bits)); } // // examples of some colliding hashes // https://bugzilla.mozilla.org/show_bug.cgi?id=290032#c5 // //$ ./hashit "ABA/xxx.aba" //8ffac222 //$ ./hashit "XyZ/xxx.xYz" //8ffac222 //$ ./hashit "CSS/xxx.css" //8ffac222 //$ ./hashit "JPG/xxx.jpg" //8ffac222 //$ ./hashit modules_newsfeeds/MenuBar/MenuBar.css //15c23729 //$ ./hashit modules_newsfeeds/ListBar/ListBar.css //15c23729 //$ ./hashit modules_newsfeeds/MenuBar/MenuBar.js //a15c23e5 //$ ./hashit modules_newsfeeds/ListBar/ListBar.js //a15c23e5 // // our attempt at porting this algorithm to java... // void setup( ) { String a = "ABA/xxx.aba"; String b = "CSS/xxx.css"; String c = "CSS/xxx.css"; String d = "JPG/xxx.jpg"; println( getHash(a) ); //yes 8ffac222 println( getHash(b) ); //yes 8ffac222 println( getHash(c) ); //yes 8ffac222 println( getHash(d) ); //no [??] FFFFFF98, not 8ffac222 println( "-----" ); String e = "modules_newsfeeds/MenuBar/MenuBar.css"; String f = "modules_newsfeeds/ListBar/ListBar.css"; println( getHash(e) ); //no [??] FFFFFF8C, not 15c23729 println( getHash(f) ); //no [??] FFFFFF8C, not 15c23729 println( "-----" ); String g = "modules_newsfeeds/MenuBar/MenuBar.js"; String h = "modules_newsfeeds/ListBar/ListBar.js"; println( getHash(g) ); //yes [??] FFFFFF8C, not a15c23e5 println( getHash(h) ); //yes [??] FFFFFF8C, not a15c23e5 }

+7

java c ++ algorithm firefox hash

jedierikb Mar 20 '09 at 1:48

source share

6 answers

From what I understand as a read-only bugzilla entry, an error occurs when two different problems occur:

Their hash algorithm generates collisions for URLs that are "fairly similar." Of the bugs, “fairly similar” seems to mean every 4 characters (or maybe 8), the same addresses and
Their logic for dealing with hash collings fails because they have not yet flushed the previous url with the same hash value to disk.

Basically, if you have a page with two very similar URLs, this can happen on some versions of Firefox. As a rule, this will not happen on different pages, I would expect, since then FF will have time to clear the records to disk, avoiding the synchronization problem.

So, if you have several resources (scripts, images, etc.) that are loaded from the same page, make sure that they have a range of 9 characters, which are completely different. One way you could verify this is to add a query (which you ignore) with a random bit of data, for example:

http://foo.com/resource.js ? r = dn # @JdsK #

+6

i_am_jorf Mar 20 '09 at 1:58

source share

This error was the main problem for my site: http://worldofsolitaire.com

I worked on this for a long time, using a conditional rule in the .htaccess file, which will disable ALL image caching on the site for Firefox users. It was a terrible thing to do, but at that time I could not track the error in Firefox, and the site was a bit slower than showing duplicate / corrupted images.

When I read the related error that it was fixed in recent versions of Firefox, I changed the condition on April 19, 2009 (yesterday) to disable caching for Firefox 2 users.

After a few hours, I received more than 10 letters from users of Firefox 3 (confirmed) that they saw duplicate images. Therefore, this issue is a STILL issue in Firefox 3.

I decided to create a simple Linux testing program that would allow me to check the URL to see if they generate the same cache cache keys.

To compile on any Linux system: g ++ -o ffgenhash ffgenhash.cpp

Here is the code (save to ffgenhash.cpp file)

 #include <stdio.h> #include <string.h> #include <stdlib.h> #define ULONG_MAX 0xFFFFFFFF #define PR_ROTATE_LEFT32(a, bits) (((a) << (bits)) | ((a) >> (32 - (bits)))) unsigned long ffgenhash(const char * key) { unsigned long h=0; for(const unsigned char * s = (unsigned char *) key; *s != '\0'; ++s) { h = PR_ROTATE_LEFT32(h, 4) ^ *s; } return (h==0 ? ULONG_MAX : h); } int main(int argc, char ** argv) { printf("%d\n", ffgenhash(argv[1])); return 0; }

As you can see, here are two real URLs that generate the same cache cache key:

 ./ffgenhash "http://worldofsolitaire.com/decks/paris/5/12c.png" 1087949033 ./ffgenhash "http://worldofsolitaire.com/decks/paris/5/13s.png" 1087949033

Since I preload these images into a Javascript loop, trying to use some kind of empty <script> tag is not possible to work around here.

Indeed, I believe that the only real solution is to change the URL for Firefox users in some way to create a unique cache key. So the approach that I will use.

By the way, I will be tempted to create a Firebug add-on that will check all the resources downloaded by the site and give a big mistake if the two resources on the site have a common hash key, so that the developer knows. It would be great to launch sites like Google maps, as I have seen strange things with these images over the past few years :)

+2

Sembiance Apr 20 '09 at 13:50

source share

Firstly, you cannot unambiguously assign all chains to integers (it is obvious that there are more lines than (fixed size) integers, so there should be collisions). You may have a hash table that can store all datasets (for example, all your files), but to get it you need to change the hash table code, not the hash.

Secondly, I see a problem with the hash function you passed in this part:

 PR_ROTATE_LEFT32(h, 4)

If the rotation h actually occurs (I did not check for this), a rotation of 4 means that lines that have two 8-byte (I assume 32-bit hashes) parts are replaced (for example, xxxxxxxxyyyyyyyy vs. yyyyyyyyxxxxxxxx ) will have equal hash. If you change it to something with a relatively simple hash size (e.g. 5), this will only happen for exchanged parts of length 32.

+1

jpalecek Mar 20 '09 at 2:02

source share

This is a modified version of the Sembiance hash generator that works correctly even when compiling on a 64-bit platform:

 #include <stdio.h> #include <string.h> #include <stdlib.h> #define ULONG_MAX 0xFFFFFFFF #define PR_ROTATE_LEFT32(a, bits) (((a) << (bits)) | ((a) >> (32 - (bits)))) unsigned int ffgenhash(const char * key) { unsigned int h=0; for(const unsigned char * s = (unsigned char *) key; *s != '\0'; ++s) { h = PR_ROTATE_LEFT32(h, 4) ^ *s; } return (h==0 ? ULONG_MAX : h); } int main(int argc, char ** argv) { printf("%u\n", ffgenhash(argv[1])); return 0; }

+1

Antonin Hildebrand Mar 10 '10 at 0:35

source share

You are apparently mistaken about the real mistake. Sure, there are hash collisions due to the awkward choice of a hash algorithm. But even a hash (x) = 1 will not cause the described problems. He simply turned the search for O (1) into a search for a linked list of O (N) in the first bucket.

The real problem is that Firefox cannot handle hash collisions. Therefore, this requires an ideal hash of all URLs. All URLs are, unfortunately, out of your control.

0

Msalters Apr 1 '09 at 10:16

source share

Friguy · Accepted Answer · 2009-03-20T02:20:57+0000

Here's how the algorithm works:

 initialize hash to 0 for each byte shift hash 4 bits to left (with rotate) hash = hash XOR character

visually (16-bit):

 00110000 = '0' 00110001 = '1' 00110010 = '2' 00110011 = '3' 0100 0011 = '4' 00110101 = '5' ==================== 01000110001000010000 (and then this will be 'rotated' so that it lines up with the end) giving: 00100001000001000110

This means that if you have strings of the same length and basically the same, then in at least one case the lower 4 bits of char and the upper 4 bits of the next char xor each other must be unique. Nevertheless, the method of gluing a 32-bit number into a table can be weaker, which means that it requires that the lower 4 xor upper4 of a certain place in the line (mod 8 characters) be unique.

Firefox cache cache generation algorithm error

More articles: