HTML5 input sample against non-latin letters

I want to pre-validate some input form with the new HTML5 attirbute template. My dataset is "Domain Name", so <input type="url"> regex preset does not apply.

But there is a problem, I do not use A-Za-z because of the damned IDN (internationalized domain name).

So the question is: is there a way to use <input pattern=""> to validate non-English letters?

I tried \w ofcource, but it works only for Latin ...

Maybe someone has a set of \xNN-\xNN that guarantees the input of all Unicode alpha characters or in some other way?

edit: "This question may already have an answer here:" - no, no answer.

+6
source share
3 answers

Based on my testing, the HTML5 template attributes support Unicode character code characters in the same way that JavaScript does and does not :

  • It only supports \u notation for Unicode code points, so \u00a1 will match '& iexcl;'.
  • Since they define characters, you can use them in character ranges, for example [\u00a1-\uffff]
  • . will also match Unicode characters.

You don’t actually indicate how you want to pre-test, so I can’t help you anymore, but by looking at the Unicode character icons you should be able to work out what you need in your regular expression,

Keep in mind that regexing a pattern is pretty stupid in general and not universally supported. I recommend a progressive improvement with some javascript on top of the template value (you can even reuse the regex more or less).

As always, never trust user input. It doesn’t take a genius to query your form endpoint and convey more or less any data that they like. Your server-side check must necessarily be more explicit. Your client-side verification may be more generous, depending on whether false positives or false negatives are more problematic for your use.

+2
source

I know that this is not what you want to hear, but ...

The HTML5 template attribute is actually not for the programmer the same as for the user. Therefore, given the unfortunate limitations of pattern , it is best for you to provide a “free” pattern - one that does not give false negatives, but allows several false positives. When I ran into this problem, I found it best to make a template consisting of a blacklist + a pair of minimum requirements. Hope this can be done in your case.

0
source

To match any letter from any language, use \p{L} . It can help you.

-1
source

All Articles