Regex, how to delete all non-alphanumeric data, except the colon, at a 12/24-hour timestamp?

I have a line like:

Today, 3:30pm - Group Meeting to discuss "big idea" 

How do you create a regular expression so that after parsing it returns:

 Today 3:30pm Group Meeting to discuss big idea 

I would like it to delete all non-alphanumeric characters, except those that appear in a 12 or 24 hour time stamp.

+4
source share
6 answers
 # this: D:DD, DD:DDam/pm 12/24 hr re = r':(?=..(?<!\d:\d\d))|[^a-zA-Z0-9 ](?<!:)' 

The colon must precede at least one digit and follow at least two digits: then this is the time. All other colons will be considered text colons.

How it works

 : // match a colon (?=.. // match but not capture two chars (?<! // start a negative look-behind group (if it matches, the whole fails) \d:\d\d // time stamp ) // end neg. look behind ) // end non-capture two chars | // or [^a-zA-Z0-9 ] // match anything not digits or letters (?<!:) // that isn't a colon 

Then, when applied to this silly text:

 Today, 3:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good 

... changes it to:

 Today, 3:30pm Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 16:47 is also good 
+7
source

Python

 import string punct=string.punctuation s='Today, 3:30pm - Group Meeting:am to discuss "big idea" by our madam' for item in s.split(): try: t=time.strptime(item,"%H:%M%p") except: item=''.join([ i for i in item if i not in punct]) else: item=item print item, 

Exit

 $ ./python.py Today 3:30pm Group Meetingam to discuss big idea by our madam # change to s='Today, 15:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good' $ ./python.py Today 15:30pm Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 1647 is also good 

NB: The method should be improved to check the actual time only when necessary (by imposing conditions), but I will leave it like this for now.

+2
source

I suppose you want to keep spaces as well, and this implementation is done in python, but this is PCRE, so it should be portable.

 import re x = u'Today, 3:30pm - Group Meeting to discuss "big idea"' re.sub(r'[^a-zA-Z0-9: ]', '', x) 

Conclusion: 'Today is 3:30 pm Group meeting to discuss a big idea'

for a slightly cleaner answer (no double spaces)

 import re x = u'Today, 3:30pm - Group Meeting to discuss "big idea"' tmp = re.sub(r'[^a-zA-Z0-9: ]', '', x) re.sub(r'[ ]+', ' ', tmp) 

Conclusion: "Today at 15:30 a group meeting to discuss a big idea"

+1
source

You can try, in Javascript:

 var re = /(\W+(?!\d{2}[ap]m))/gi; var input = 'Today, 3:30pm - Group Meeting to discuss "big idea"'; alert(input.replace(re, " ")) 
+1
source

The correct regular expression for this is:

 '(?<!\d):|:(?!\d\d)|[^a-zA-Z0-9 :]' 
0
source

s = "Call me darling at 3:30"

re.sub (r '[^ \ w:]', '', s)

"Call me darling at 3:30."

-one
source

All Articles