How to validate form input using HTML5 input validation

I tried to find a complete list of templates used to validate input using HTML5 form validation for different types, in particular url , email , tel , etc., but I could not find them. Currently, the embedded versions of these input checks are far from perfect (and tel doesn't even check if what you enter is a phone number). So I was wondering what patterns I can use to verify that the user enters the correct format in the inputs?

Here are some examples of cases where the default check allows input that is not allowed:

type="email"

This field allows mails that have invalid domains after @, and allows addresses to start or end with a dash or period, which is also unacceptable. Thus, .example-@x valid.

type="url"

This input basically allows you to enter any data starting with http:// (Chrome), followed by everything except a few special characters, such as those that have a function in the URLs (\, @, #, ~, etc. .d.). In FF, all that is checked is if it starts with http: followed by just : (even just http: allowed in FF). IE does the same as FF, except that it does not prohibit http:: .

For example: http://. Allowed in all three. And so http://,

type="tel"

There is currently no built-in verification of phone numbers in any of the main browsers (it works 100% the same as type="text" , except for messages about mobile browsers, which keyboard to display.




So, since browsers do not show consistent behavior in each of these cases, and since the behavior they show is very simple with a lot of false positives, what can I do to validate my HTML forms (still using HTML5 input validation)?




PS: I publish this because it would be useful for me to have a complete list of form validation templates, so I decided that it could be useful for others (and, of course, others can post their solutions too).

+8
html html5 regex forms
Nov 18 '14 at 17:49
source share
1 answer

These templates are not necessarily simple, but here I think that works best in any situation. Keep in mind that (most recently) Internationalized Domain Names ( IDNs ) are also available. At the same time, a valid number of characters is allowed in URLs (there are still many characters that are not allowed in domain names, but the list of allowed characters is so large and will change so often for different levels of the top level Domains that it is not safe for them to keep up with them ) If you want to support internationalized domain names, you must use the second URL pattern, otherwise use the first.

TL; DR:

Here's a live demo to see the following patterns in action. Scroll down to explain, analyze and analyze these patterns.

Url

 https?:\/\/(?![^\/]{253}[^\/])((?!-.*|.*-\.)([a-zA-Z0-9-]{1,63}\.)+[a-zA-Z]{2,15}|((1[0-9]{2}|[1-9]?[0-9]|2([0-4][0-9]|5[0-5]))\.){3}(1[0-9]{2}|[1-9]?[0-9]|2([0-4][0-9]|5[0-5])))(\/.*)? https?:\/\/(?!.{253}.+$)((?!-.*|.*-\.)([^ !-,\.\/:-@\[-`{-~]{1,63}\.)+([^ !-\/:-@\[-`{-~]{2,15}|xn--[a-zA-Z0-9]{4,30})|(([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])\.){3}([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9]))(\/.*)? 

Letters

 (?!(^[.-].*|[^@]*[.-]@|.*\.{2,}.*)|^.{254}.)([a-zA-Z0-9!#$%&'*+\/=?^_`{|}~.-]+@)(?!-.*|.*-\.)([a-zA-Z0-9-]{1,63}\.)+[a-zA-Z]{2,15} 

Phone numbers

 ((\+|00)?[1-9]{2}|0)[1-9]( ?[0-9]){8} ((\+|00)?[1-9]{2}|0)[1-9]([0-9]){8} 

Western style names

 ([A-ZΆ-ΫÀ-ÖØ-Þ][A-ZΆ-ΫÀ-ÖØ-Þa-zά-ώß-öø-ÿ]{1,19} ?){1,10} 



URL without IDN support

 https?:\/\/(?![^\/]{253}[^\/])((?!-.*|.*-\.)([a-zA-Z0-9-]{1,63}\.)+[a-zA-Z]{2,15}|((1[0-9]{2}|[1-9]?[0-9]|2([0-4][0-9]|5[0-5]))\.){3}(1[0-9]{2}|[1-9]?[0-9]|2([0-4][0-9]|5[0-5])))(\/.*)? 

Regular expression visualization

Explanation:

  • DNSes
    • URLs should always start with http: // or https: // since we don’t need links to other protocols.
    • Domain names must not begin or end with -
    • Domain names can contain no more than 63 characters (so a maximum of 63 characters between each point), and the total length (including points) cannot exceed 253 (or 255? Be safe and bet on 253.) characters [1] .
    • Non-IDNs can only support latin letters, numbers 0 through 9, and dashes.
    • Non-IDN top-level domains contain only letters of the Latin alphabet [2] .
    • I set an arbitrary limit of 15 letters, as there are currently no domains that exceed 13 characters (" .international "), which most likely will not change in the near future.
  • IP address
    • Special cases, such as 0.0.0.0 , 127.0.0.1 , etc., are not checked for
    • IP addresses that have padded zeros in them are not allowed (for example, 01.1.1.1 ) [4] .
    • IP addresses can only go from 0 to 255. 256 is not allowed.

Please note that the default http:.* Template built into modern browsers will always be applied, so even if you delete https?:// at the beginning of this template, it will continue to be applied. Use type="text" to avoid this.

IDN-enabled URLs

 https?:\/\/(?!.{253}.+$)((?!-.*|.*-\.)([^ !-,\.\/:-@\[-`{-~]{1,63}\.)+([^ !-\/:-@\[-`{-~]{2,15}|xn--[a-zA-Z0-9]{4,30})|(([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9])\.){3}([01]?[0-9]{2}|2([0-4][0-9]|5[0-5])|[0-9]))(\/.*)? 

Regular expression visualization

Explanation:

Since there is a huge number of characters allowed in IDN, it is almost impossible to list every possible combination in the HTML attribute (you will get a huge template, so in this case it is much better to check it for a different method than the regular expression) [5] .

  • Prohibited characters in domain names:! !"#$%&'()*+, ./ :;<=>?@ [\]^_`` {|}~ , Except for the period as the domain separator.
    • They correspond to the ranges [!-,] [\.\/] [:-@] [\[-``] [{-~] .
  • All other characters are allowed in this input field.
  • TLDs are allowed to have the same letters in them up to an arbitrary limit of 15 characters (for example, with URLs other than IDNs).
  • Alternatively, TLDs may be in xn--* format, while * is a coded version of a real TLD. This encoding uses 2 latin letters or Arabic numerals per source character, so the arbitrary limit here doubles to 30.

Email Addresses

 (?!(^[.-].*|[^@]*[.-]@|.*\.{2,}.*)|^.{254}.)([a-zA-Z0-9!#$%&'*+\/=?^_`{|}~.-]+@)(?!-.*|.*-\.)([a-zA-Z0-9-]{1,63}\.)+[a-zA-Z]{2,15} 

Regular expression visualization

Explanation:

Since email addresses require much more than this template to be 100% reliable, it will cover almost 100% of them. A 100% complete template exists , but contains PCRE (PHP) - only regular expressions, so it will not work in HTML forms.

  • E-mail addresses can contain only letters of the Latin alphabet, numbers 0-9 and characters in !#$%&'*+\/=?^_``{|}~.- [6] .
  • Emphasis is not universally supported [7] but post a comment if necessary and I could write a version that complies with RFC 6530 .
  • The local part (up to @ can contain only 63 characters, and a common address can consist of only 254 characters [8] .
  • Addresses may not start or end with a - or . , and no two points can appear sequentially [8] .
  • The domain may not be an IP address [9] .
    • Other than that, I included only the part that does not contain the template IDN. IDNs are also allowed, so they will lead to false negatives.

Phone numbers

 ((\+|00)?[1-9]{2}|0)[1-9]( ?[0-9]){8} ((\+|00)?[1-9]{2}|0)[1-9]([0-9]){8} 

Regular expression visualization

Explanation:

  • Phone numbers must begin with one of the following, where [CTRY] stands for the country code and X stands for the first non-zero digit (for example, 6 in mobile numbers),
    • 00[CTRY]X
    • +[CTRY]X
    • 0X
    • [CTRY]X (This is not officially the correct syntax, but Chrome Autofill for some reason he likes.)
  • Between the numbers spaces are allowed (see the second pattern for the version without a space), except before non-zero X, as defined above.
  • Phone numbers must be exactly 9 digits, except for the part, before the first non-zero X, as indicated above.

This regular expression is for 10-digit phone numbers only. Since the length of phone numbers may vary from country to country, it is better to use a less strict version of this template or change it to work in the desired countries. Therefore, this template should usually be used as a template for the template.

Optional: Western-style names

 ([A-ZΆ-ΫÀ-ÖØ-Þ][A-ZΆ-ΫÀ-ÖØ-Þa-zά-ώß-öø-ÿ]{1,19} ?){1,10} 

Regular expression visualization

Yes, I know, I am very west oriented, but it can also be useful, as it can be difficult to do, and if you create a site for Westerners, it will always be (Asian names also have a representation in this format).

  • All names must start with a capital letter
  • Uppercase letters may appear in the middle of names (e.g. John McDoe)
  • Names must be at least 2 letters long
  • I set an arbitrary maximum of 10 names ( these people probably won't mind ), each of which can be no more than 20 letters long (the length of "Werbenjagermanjensen", which turned out to be No. 1).
  • Latin and Greek letters are allowed, including all accented Latin and Greek letters ( list ) and Icelandic letters ( ÐÞ ðþ ):
    • AZ matches all uppercase Latin letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ
    • Ά-Ϋ matches all uppercase Greek letters, including those accented: Ά·ΈΉΊ΋Ό΍ΎΏΐ ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡ΢ΣΤΥΦΧΨΩ ΪΫ .
    • À-ÖØ-Þ matches all uppercase Latin letters, as well as and Þ: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ . Between them there is also the symbol × (between Ö and Ø ), which is left in this way.
    • AZ matches all lowercase Latin letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ
    • ά-ώ matches all lowercase Greek letters, including the accented: άέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώ
    • ß-öø-ÿ matches all lowercase latin letters and ß, ð and þ: ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ Between them there is also a ÷ (between ö and ø ), which is not taken into account.

References

+31
Nov 18 '14 at 17:49
source share



All Articles