Return all characters to the first underscore

Question

Return all characters to the first underscore

Using re in Python, I would like to return all the characters in the string that precedes the first kind of underscore. In addition, I would like the string to be returned in all capital letters and without any non-palper characters.

For example:

 AG.av08_binloop_v6 = AGAV08 TL.av1_binloopv2 = TLAV1

I'm sure I know how to return a string in all uppercase letters using string.upper() , but I'm sure there are several ways to remove it effectively . . Any help would be greatly appreciated. I am still slowly but surely learning regular expressions. Each tip is added to my notes for future reference.

To clarify, my examples above are not real lines. The actual line will look like this:

 AG.av08_binloop_v6

With my desired result it looks like:

 AGAV08

And the following example will be the same. Line:

 TL.av1_binloopv2

Required Conclusion:

 TLAV1

Thanks again for your help!

+7

python string regex

durandal 21 sept '10 at 16:31

source share

6 answers

Even without re :

 text.split('_', 1)[0].replace('.', '').upper()

+20

eumiro 21 sept '10 at 16:33

source share

You do not need to use re for this. Simple string operations will suffice based on your requirements:

 tests = """ AG.av08_binloop_v6 = AGAV08 TL.av1_binloopv2 = TLAV1 """ for t in tests.splitlines(): print t[:t.find('_')].replace('.', '').upper() # Returns: # AGAV08 # TLAV1

Or, if you absolutely must use re :

 import re pat = r'([a-zA-Z0-9.]+)_.*' pat_re = re.compile(pat) for t in tests.splitlines(): print re.sub(r'\.', '', pat_re.findall(t)[0]).upper() # Returns: # AGAV08 # TLAV1

+2

jathanism 21 sept '10 at 16:36

source share

Since everyone gives their favorite implementation, here is mine that doesn't use re :

 >>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'): ... print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper() ... AGAV08 TLAV1

I put .upper() on the outside of the generator, so it is called only once.

+2

Steven rumbalski 21 sept '10 at 18:02

source share

It, just for fun, is another option for getting text before the first underline:

 before_underscore, sep, after_underscore = str.partition('_')

Thus, everything in one line can be:

 re.sub("[^AZ\d]", "", str.partition('_')[0].upper())

+2

Etienne 21 sept '10 at 18:50

source share

import re

re.sub ("[^ AZ \ d]", "", yourstr.split ('_', 1) [0] .upper ())

+1

Daniel Lenkes 21 sept '10 at 17:15

source share

Gumbo · Accepted Answer · 2010-09-21T16:37:13+0000

Try the following:

 re.sub("[^AZ\d]", "", re.search("^[^_]*", str).group(0).upper())

Return all characters to the first underscore

More articles: