Validating an HTML ID with RE

Problem Statement :

The value of the identifier must begin with a letter ([A-Za-z]) and can be accompanied by any number of letters, numbers ([0-9]), hyphens ("-"), underscore ("_")), colons (": ") and periods (". ").

I made a regular expression.

The code:

>>> import re >>> id_value1 = "custom-title1" >>> id_value2 = "1-custom-title" >>> pattern = "[A-Za-z][\-A-Za-z0-9_:\.]*" 

Code for a valid identifier value

 >>> flag= False >>> try: ... if re.finadll(pattern, id_value1)[0]==id_value1: ... flag=True ... except: ... pass ... >>> print flag False 

Code for invalid ID value:

 >>> flag = False >>> try: ... if re.findall(pattern, id_value2)[0]==id_value2: ... flag=True ... except IndexError: ... pass ... >>> print flag False 

Code for IndexError

 >>> try: ... if re.findall(pattern, "")[0]=="": ... print "In " ... except IndexError: ... print "Exception Index Error" ... Exception Index Error >>> 

I will go above the code into one function. This function will call more than 1000 times. So can anyone optimize the code above?

+4
source share
2 answers

You must match the end of the line, compile your template, and use re.match () instead of re.findall ()

 import re id_value1 = "custom-title1" id_value2 = "1-custom-title" pattern = "[A-Za-z][\-A-Za-z0-9_:\.]*$" compiled = re.compile(pattern) def validate(id): return bool(compiled.match(id)) print validate(id_value1) print validate(id_value2) 
+2
source

I have a class identifier generation.

In the __init__() function, I compile a template .

 def __init__(self): self.id_validation = re.compile("[A-Za-z][\-A-Za-z0-9_:\.]*$") 

Below is the function of checking the class values ​​to generate an identifier, since we use the class values ​​of the HTML tag to generate the identifier for the corresponding HTML tag. Example HTML element: <div class="custom1 123-custom>

 def validateClassNames(self, element_obj): """ Remove class names which are ignore in id generation. Validate class value according to ID generation rule. """ try: class_names = element_obj.attrib["class"] except: return False, "" #- Class value must be Alphabets class_list = [class_vlaue for class_vlaue in class_names.split(" ")\ if class_vlaue not in self.ignore_values and bool(self.id_validation.match(class_vlaue))] #- Return True if have class list. if len(class_list) != 0: return True, class_list return False, "" 
+1
source

All Articles