Matching single-line JavaScript (//) comments with re

I would like to filter out (mostly single-line) comments from (mostly valid) JavaScript using the python re module. For example:

 // this is a comment var x = 2 // and this is a comment too var url = "http://www.google.com/" // and "this" too url += 'but // this is not a comment' // however this one is url += 'this "is not a comment' + " and ' neither is this " // only this 

I have now tried this for more than six months without any success. Can anyone help me out?

EDIT 1 :

 foo = 'http://stackoverflow.com/' // these // are // comments // too // 

EDIT 2 :

 bar = 'http://no.comments.com/' 
+6
javascript python regex replace
source share
2 answers

My regexp powers are a bit outdated, so I used your question for fresh that I remember. This became a pretty big regex because I also wanted to filter out multi-line comments.

 import re reexpr = r""" ( # Capture code "(?:\\.|[^"\\])*" # String literal | '(?:\\.|[^'\\])*' # String literal | (?:[^/\n"']|/[^/*\n"'])+ # Any code besides newlines or string literals | \n # Newline )| (/\* (?:[^*]|\*[^/])* \*/) # Multi-line comment | (?://(.*)$) # Comment $""" rx = re.compile(reexpr, re.VERBOSE + re.MULTILINE) 

This regular expression matches three different subgroups. One for code and two for comments. The following is an example of how to extract them.

 code = r"""// this is a comment var x = 2 * 4 // and this is a comment too var url = "http://www.google.com/" // and "this" too url += 'but // this is not a comment' // however this one is url += 'this "is not a comment' + " and ' neither is this " // only this bar = 'http://no.comments.com/' // these // are // comments bar = 'text // string \' no // more //\\' // comments bar = 'http://no.comments.com/' bar = /var/ // comment /* comment 1 */ bar = open() /* comment 2 */ bar = open() /* comment 2b */// another comment bar = open( /* comment 3 */ file) // another comment """ parts = rx.findall(code) print '*' * 80, '\nCode:\n\n', '\n'.join([x[0] for x in parts if x[0].strip()]) print '*' * 80, '\nMulti line comments:\n\n', '\n'.join([x[1] for x in parts if x[1].strip()]) print '*' * 80, '\nOne line comments:\n\n', '\n'.join([x[2] for x in parts if x[2].strip()]) 
+7
source share

It may be easier to make out if you had explicit half-columns.

In any case, this works:

 import re rx = re.compile(r'.*(//(.*))$') lines = ["// this is a comment", "var x = 2 // and this is a comment too", """var url = "http://www.google.com/" // and "this" too""", """url += 'but // this is not a comment' // however this one is""", """url += 'this "is not a comment' + " and ' neither is this " // only this""",] for line in lines: print rx.match(line).groups() 

The conclusion above:

 ('// this is a comment', ' this is a comment') ('// and this is a comment too', ' and this is a comment too') ('// and "this" too', ' and "this" too') ('// however this one is', ' however this one is') ('// only this', ' only this') 

I'm not sure what you are doing with javascript after deleting comments, but JSMin can help. In any case, it removes comments well enough, and exists in python .

+1
source share

All Articles