Why is there an URL encoding for the ASCII character set

Question

Why is there an URL encoding for the ASCII character set

URLs can only be sent over the Internet using the ASCII character set.

Why is there a URL encoding for ASCII characters such as a, b, c when it can be sent over the Internet without any URL encoding?

For example: why encode 'a' when it can send as 'a'

What are the possible reasons for encoding ASCII characters? The only reason I can think of are hackers who are trying to make their URL as unreadable as possible in order to carry out XSS attacks.

+3

security url encoding ascii

Computernerd Dec 31 '14 at 8:40

source share

4 answers

URL encoding exists for the entire ASCII range, since it was easier to define an encoding that works for all characters than to determine what works only for a character set with special values.

+3

Mark Dec 31 '14 at 8:46

source share

URL encoding allows characters that have a special meaning in the URL that should be included in the segment, without their special meaning. There are many examples, but the most common ones for coding are ","? "," = "AND" & "

+1

Rowland Shaw Dec 31 '14 at 8:43

source share

URL coding has been designed so that it can encode any ASCII character.

So far = encoded as %3d ? encoded as %3f , and & encoded as %26 , it makes sense to encode a as %61 and b , which will be encoded as %62 , since the hexadecimal number after % represents the ASCII code of this character.

+1

SilverlightFox Dec 31 '14 at 17:31

source share

unor · Accepted Answer

STD 66, Percent Encoding :

The percent encoding mechanism is used to represent a data octet in a component when the corresponding octet symbol is outside the permitted set or is used as a separator of the component or inside it.

Thus, percentage coding is a kind of evacuation mechanism: some characters have special meaning in URI components (→ they are reserved). If you want to use such a character without its special meaning, you encode it in percent.

Unreserved characters (e.g. a , b , c , ...) can always be used directly, but it is also allowed to encode them in percent. Such URIs will be equivalent :

URIs that differ by replacing an unreserved character with its corresponding US-ASCII percent encoded octet are equivalent: they identify the same resource.

Why is it allowed to encode unreserved characters in some percentage? outdated RFC 2396 contains (bold by me):

Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done if the URI is not used in a context that prevents the unescaped character from being displayed .. p>

I cannot think of an example for such a “context,” but this sentence suggests that there may be some.

In addition, maybe some people / implementations simply simply encode everything (except for delimiters, etc.), so they don’t need to check whether / which characters need percent encoding in the corresponding component.

Why is there an URL encoding for the ASCII character set

More articles: