Python: trying to omit a string and remove non-alphanumeric characters away from the place

I am trying to remove all non-alphanumeric characters except a space from a string, but cannot figure out how I exclude space. I am currently doing it like this:

re.sub('[\W_]+', '', text).lower().strip()

But running my function gives the following results:

print removePunctuation('Hi, you!')
print removePunctuation(' No under_score!')
hiyou
nounderscore

Where i want:

hi you
no underscore

So how can I exclude a place from substitution?

My current optimum:

re.sub('[^\s\w]+', '', text).lower().strip().replace('_','')
+4
source share
5 answers

You can use this,

re.sub(r'[^\sa-zA-Z0-9]', '', text).lower().strip()

Example:

>>> import re
>>> def removePunctuation(s):
        return re.sub(r'[^\sa-zA-Z0-9]', '', s).lower().strip()

>>> print removePunctuation('Hi, you!')
hi you
>>> print removePunctuation(' No under_score!')
no underscore

OR

re.sub('(?!\s)[\W_]', '', text).lower().strip()
+6
source

You may like the list comprehension here:

result = ''.join([c for c in myString if str.isalnum(c) or str.isspace(c)])
0
source

? , RegExp.

def removePunctuation(s):
    return ''.join(l for l in s if l.isalnum() or l == ' ').lower().strip()

lambda

removePunctuation = lambda s: ''.join(l for l in s if l.isalnum() or l == ' ').lower().strip()
0

str.translate:

s = 'Hi, you!'

from string import  punctuation

print(s.translate(None,punctuation).lower())
hi you

python3:

s = 'Hi, you!'

from string import  punctuation

print(s.translate({ord(k):"" for k in punctuation}).lower())
hi you

:

from string import punctuation

def remove_punctuation(s):
    return s.translate(None,punctuation).lower()

def remove_punctuation(s):
    return s.translate({ord(k): "" for k in punctuation}).lower()

:

In [3]: remove_punctuation(' No under_score!')
Out[3]: ' no underscore'

In [4]: remove_punctuation('Hi, you!')
Out[4]: 'hi you'

If you want to remove leading spaces, add a strip.

from string import punctuation
def remove_punctuation(s):
    return s.translate(None,punctuation).lower().strip()

Conclusion:

In [6]: remove_punctuation(' No under_score!')
Out[6]: 'no underscore'

In [7]: remove_punctuation('Hi, you!')
Out[7]: 'hi you'
0
source

You can address the problem from the other side:

re.sub('([^\w ]|_)+', '', 'a ,_  b').lower().strip()

He gives a b

So you say: delete everything that is not an alphanumeric character, not a space or an underscore.

-1
source

All Articles