How to remove dates from a list in Python

I have a list of tokenized text (list_of_words) that looks something like this:

list_of_words = ['08/20/2014', '10:04:27', 'pm', 'complet', 'vendor', 'per', 'mfg/recommend', '08/20/2014', '10:04:27', 'pm', 'complet', ...] 

and I'm trying to remove all instances of dates and times from this list. I tried using the .remove () function, but to no avail. I tried passing wildcards such as ".. / .. / ...." to the list of stop words that I sorted with, but that didn't work. Finally, I tried to write the following code:

 for line in list_of_words: if re.search('[0-9]{2}/[09]{2}/[0-9]{4}',line): list_of_words.remove(line) 

but this also does not work. How can I delete everything formatted as date or time from my list?

+7
python regex nltk
source share
3 answers

Description

 ^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$ 

Regular expression visualization

This regex will do the following:

  • find lines that look like dates 12/23/2016 and times 12:34:56
  • find strings that are also am or pm , which are probably part of the previous time in the source list

Example

Live demo

List Example

 08/20/2014 10:04:27 pm complete vendor per mfg/recommend 08/20/2014 10:04:27 pm complete 

List after processing

 complete vendor per mfg/recommend complete 

Python Script Example

 import re SourceList = ['08/20/2014', '10:04:27', 'pm', 'complete', 'vendor', 'per', 'mfg/recommend', '08/20/2014', '10:04:27', 'pm', 'complete'] OutputList = filter( lambda ThisWord: not re.match('^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$', ThisWord), SourceList) for ThisValue in OutputList: print ThisValue 

Explanation

 NODE EXPLANATION ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- (?: group, but do not capture (2 times): ---------------------------------------------------------------------- [0-9]{2} any character of: '0' to '9' (2 times) ---------------------------------------------------------------------- [:\/,] any character of: ':', '\/', ',' ---------------------------------------------------------------------- ){2} end of grouping ---------------------------------------------------------------------- [0-9]{2,4} any character of: '0' to '9' (between 2 and 4 times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- am 'am' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- pm 'pm' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- 
+6
source share

if you want a time and date string to be displayed in the list, perhaps you can try the following regular expression:

 [0-9]{2}[\/,:][0-9]{2}[\/,:][0-9]{2,4} 

enter image description here

add python code:

 import re list_of_words = [ '08/20/2014', '10:04:27', 'pm', 'complet', 'vendor', 'per', 'mfg/recommend', '08/20/2014', '10:04:27', 'pm', 'complet' ] new_list = [item for item in list_of_words if not re.search(r'[0-9]{2}[\/,:][0-9]{2}[\/,:][0-9]{2,4}', item)] 
+6
source share

Try the following:

 import re list_of_words = ['08/20/2014', '10:04:27', 'pm', 'complet', 'vendor', 'per', 'mfg/recommend', '08/20/2014', '10:04:27', 'pm', 'complet'] list_of_words = filter( lambda x: not re.match('[0-9]{2}[\/,:][0-9]{2}[\/,:][0-9]{2,4}', x), list_of_words) 
+1
source share

All Articles