What characters are allowed in the HTML Name attribute inside the input tag?

I have a PHP script that will generate <input> dynamically, so I was wondering if I need to filter any characters in the name attribute.

I know that a name must begin with a letter, but I do not know any other rules. I believe square brackets should be allowed, as PHP uses them to create arrays from form data. What about parentheses? Spaces?

+72
html html-input html-form web-standards
Aug 6 2018-10-06
source share
5 answers

The only real restriction on what characters can appear in form control names is when a form is submitted using GET

"The get get method restricts the values ​​of the form dataset to ASCII characters." link

There is a good thread here .

+25
Aug 6 2018-10-06
source share

Note that not all characters are sent for the name attributes of form fields (even when using POST)!

Symbols of white space are truncated, and internal symbols are spaces, and the symbol . is replaced by _ . (Tested in Chrome 23, Firefox 13, and Internet Explorer 9, all Win7.)

+45
Dec 13 '12 at 11:15
source share

Any character that you can include in the HTML [X] file can be placed in <input name> . As Allain noted in a comment, <input name> is defined as containing CDATA , so the only things you cannot insert are control codes and invalid code points that prohibit the basic standard (SGML or XML).

Allain quotes W3 from the HTML4 specification:

Note. The get method restricts the values ​​of the form dataset to ASCII characters. To cover the entire ISO10646 character set, only the "post" method is specified (with enctype = "multipart / form-data").

However, in practice this is not so.

The theory is that application/x-www-form-urlencoded data does not have a mechanism to specify an encoding for form names or values, so using non-ASCII characters is either "not specified" as working, and you should use POSTed multipart/form-data instead.

Unfortunately, in the real world, no browser indicates the field encoding, even if it theoretically can, in the headers of the subdirectories of the multipart/form-data POST request body. (I believe Mozilla tried to implement it once, but refused because it broke the servers.)

And no browser implements the surprisingly complex and ugly standard RFC2231 , which is required to insert encoded field names without an ASCII name in multipart section headers. In any case, the HTML specification that defines multipart/form-data does not directly indicate that RFC2231 should be used, and, again, it will break the servers if you try.

So, the reality of the situation is that there is no way to find out what encoding is used for names and values ​​in the form presentation, regardless of what type of form it has. What browsers will do with field names and values ​​that contain non-ASCII characters are the same for GET and the two types of POST forms: it encodes them using the encoding of the page containing the form used. Non-ASCII GET names are no more broken than anything else.

DLH:

So, does the name have a different data type than for the other elements?

In fact, the only element whose name attribute is not CDATA is <meta> . See the HTML4 attribute specification for all uses of name ; This is an overloaded attribute name that has many different meanings for different elements. This is usually considered bad.

However, usually these days you avoid name except for form fields (where is the name of the control) and param (where is the identifier of the parameter depending on the plugin). These are just two meanings. Avoid using the old-school name to identify elements such as <form> or <a> on the page (use id instead).

+35
Aug 6 2018-10-06T00:
source share

Although Allain's comment answered OP's direct question, and bobince provided some brilliant detailed information, I believe that many people come here to look for the answer to a more specific question: β€œCan I use the dot character in the input attribute of the form name?”

Since this flow appeared as the first result, when I looked for this knowledge, I realized that I could share what I found.

First, Matthias argued that:

. replaced by _

This is not true. I don’t know if the browser really performed a similar operation back in 2013, although I doubt it. Browsers send dot characters (they talk about POST data)! You can check it in the developer tools of any decent browser.

Please note that a little comment from abluejelly, which is probably missing by many:

I would like to point out that this is a server-specific thing, not a browser thing. Tested on Win7 FF3 / 3.5 / 31, IE5 / 7/8/9/10 / Edge, Chrome39 and Safari Windows 5, and they all sent "test this.stuff" (four leading spaces) as the name to the ASP POST server. NET dev bundled with VS2012.

I checked it with the Apache HTTP server (v2.4.25), and indeed, the input name, for example, "foo.bar", is changed to "foo_bar". But in the type name "foo [foo.bar]" this dot is not replaced by _!

My conclusion: you can use dots, but I would not use it, as this can lead to unexpected behavior depending on the HTTP server used.

+2
Mar 19 '17 at 20:05
source share

Do you mean the id and name attributes of the HTML input tag?

If this is the case, I will be tempted to restrict (or translate) the allowed "introductory" character names to only az (AZ), 0-9 and a limited range of punctuation marks (".", ",", Etc. ..), if only to limit the possibilities for XSS exploits, etc.

Also, why can a user control any aspect of an input tag? (Perhaps, in the perspective of validation, it will not be easier to save the names of the input tags "custom_1", ​​"custom_2", etc., and then display them as necessary.)

0
Aug 6 2018-10-06
source share



All Articles