Why do URI-encoded anchors ('#') call 404 and how to handle it in JS?

prettyPhoto uses hashtags, but if they are encoded (up to% 23), most browsers will cause a 404 error. This has been discussed before :

You get 404 error because the #callback part is not part of the URL. This is a bookmark used by the browser and it never sends a request to the server. If you encode a hash, it becomes part instead of the file name.

  • Why did the hash become part of the file just because it is encoded in a URI? Isn't that a mistake?

  • I ask because prettyPhoto uses hashtags and suffers from the same problem. I think adding '?' before the hash is the most elegant solution, I will just lose a little how to do this in existing code:

      function getHashtag () {
     url = location.href;
     hashtag = url.indexOf ('# gallery')! == - 1)? decodeURI (url.substring (url.indexOf ('# gallery') + 1, url.length)): false;
     return hashtag;
     }
     function setHashtag () {
     if (typeof theRel == 'undefined') return;  location.hash = theRel + '/' + rel_index + '/';
     }
     function clearHashtag () {
     if (location.href.indexOf ('# gallery')! == - 1) location.hash = "";
     } 
  • Any other suggestions? I will consider customizing my 404 page, but this seems more like a problem, rather than preventing it.

Thanks!

EDIT: Since there is apparently nothing wrong with how prettyphoto handles these hashes, I ended up adding these rules to my apache server:

RewriteRule ^(.*).shtml(%23|#)$ /$1.shtml [R=301,NE,L] RewriteRule ^(.*).shtml([^g]+)gallery(.+)$ /$1.shtml#gallery$3 [R=301,NE,L] 

They successfully handle cases where% 23 caused problems.

+8
javascript hash webkit hashtag
source share
2 answers
  • Why did the hash become part of the file just because it is URI encoded? It's not a mistake?

If you point the browser to http://example.com/index.html#title , the browser interprets this to request the index.html file from the example.com server. Upon completion of the request, the browser searches for the anchor element in the document named "title" (ie My title).

If you point to http://example.com/index.html%23title , the browser makes a request for the index.html%23 file from example.com , which probably does not exist on the server, giving you 404. See the difference?

And this is not a mistake. This is part of the l ast updated internet standard in 1998. See RFC 2396 . Citation:

The "#" character is excluded because it is used to delimit the URI from the fragment identifier in the URI references (section 4).

As with 2 and 3, your code example does not have enough context to tell you what you are trying to do. What do you call your code? What are you trying to do with prettyphoto that doesn't work? Are you trying to redirect to a specific photo or gallery from a user click or other javascript event? Are you trying to open a gallery when someone visits a particular page?

I checked the related question using twitter / oauth, but I don't see how this relates to the code you provided. I started digging in prettyphoto too, but I don't see how your code also relates to this.

Instead of changing your 404 page, perhaps you need it to be a code handler or server that accepts not found requests with %23 in them and redirects the user to the decoded URL. This may have some flaws, but it would be pretty elegant if you accept incoming requests from other sources that you cannot control. What is your server environment? (language, server technician to which the machine belongs, etc.)

I would be happy to update my answer with a solution or work for you.

+7
source share

Answer # 1)

It will become part of the url because it is no longer the token that the browser / server / etc. knows how to disassemble.

I mean that "?" plays a significant role in URLs - the server knows what sooner than after. The browser does not need to worry about what is or is not dynamic in the URI - all this is significant (although JavaScript shares the values ​​in the location object).

The browser will not send "# ......" to the server, since the hashtag has special connotations for the browser.

However, if you avoid this hash in JavaScript, the browser does not hesitate to send this escaped string to the server as a literal value.

Why not? If your search query legally required a hash symbol (you make a POST request to the facebook wall and you send the phone number), then you will be screwed. Or you do a GET search of some number on 411.com or something else, and they really did not think about their application.

The problem is that the server is not going to understand that the escaped value should be stored separately from the URL if it occurs in the actual path.

It must accept escaped characters, otherwise spaces (% 20) and other daily characters that are otherwise valid in file names / paths / queries / values ​​will cause problems.

So if you are looking for:

 //mysite.gov.on.ca/path/to/file.extension%23action%3Dfullscreen 

Indeed, you will certainly be 404.

There are a few things you could do, I'm sure. The first one will be in Apache, or whatever you use, you can write RegEx that matches any URL up to the first "% 23", assuming not? in advance.

Fewer mind-related implementations may need to figure out if there is a way to avoid the "#" which are plugins.

Google, for the instance, uses a "hash" strategy ("#!"), Where it requests that the URLs be sent in such a way as to know whether to encode.

Other parameters may be checking the symbol "#" with url.indexOf("#"); and splitting the URL in the hash and sending the valid part.

In fact, it comes down to what you are trying to accomplish - I can indicate why this is a problem, but how best to make it a non-problem depends on what you are trying to do, how you are β€œtrying to do it, and what is allowed in The context in which you work.

+2
source share

All Articles