Validating an HTML ID with RE

Question

Validating an HTML ID with RE

Problem Statement :

The value of the identifier must begin with a letter ([A-Za-z]) and can be accompanied by any number of letters, numbers ([0-9]), hyphens ("-"), underscore ("_")), colons (": ") and periods (". ").

I made a regular expression.

The code:

>>> import re >>> id_value1 = "custom-title1" >>> id_value2 = "1-custom-title" >>> pattern = "[A-Za-z][\-A-Za-z0-9_:\.]*"

Code for a valid identifier value

 >>> flag= False >>> try: ... if re.finadll(pattern, id_value1)[0]==id_value1: ... flag=True ... except: ... pass ... >>> print flag False

Code for invalid ID value:

 >>> flag = False >>> try: ... if re.findall(pattern, id_value2)[0]==id_value2: ... flag=True ... except IndexError: ... pass ... >>> print flag False

Code for IndexError

 >>> try: ... if re.findall(pattern, "")[0]=="": ... print "In " ... except IndexError: ... print "Exception Index Error" ... Exception Index Error >>>

I will go above the code into one function. This function will call more than 1000 times. So can anyone optimize the code above?

+4

python regex

Vivek sable Aug 4 '15 at 12:02

source share

2 answers

I have a class identifier generation.

In the __init__() function, I compile a template .

 def __init__(self): self.id_validation = re.compile("[A-Za-z][\-A-Za-z0-9_:\.]*$")

Below is the function of checking the class values to generate an identifier, since we use the class values of the HTML tag to generate the identifier for the corresponding HTML tag. Example HTML element: <div class="custom1 123-custom>

 def validateClassNames(self, element_obj): """ Remove class names which are ignore in id generation. Validate class value according to ID generation rule. """ try: class_names = element_obj.attrib["class"] except: return False, "" #- Class value must be Alphabets class_list = [class_vlaue for class_vlaue in class_names.split(" ")\ if class_vlaue not in self.ignore_values and bool(self.id_validation.match(class_vlaue))] #- Return True if have class list. if len(class_list) != 0: return True, class_list return False, ""

+1

Vivek sable Aug 4 '15 at 13:05

source share

Daniel Hepper · Accepted Answer · 2015-08-04T12:10:28+0000

You must match the end of the line, compile your template, and use re.match () instead of re.findall ()

 import re id_value1 = "custom-title1" id_value2 = "1-custom-title" pattern = "[A-Za-z][\-A-Za-z0-9_:\.]*$" compiled = re.compile(pattern) def validate(id): return bool(compiled.match(id)) print validate(id_value1) print validate(id_value2)

Validating an HTML ID with RE

More articles: