Return all characters to the first underscore

Using re in Python, I would like to return all the characters in the string that precedes the first kind of underscore. In addition, I would like the string to be returned in all capital letters and without any non-palper characters.

For example:

 AG.av08_binloop_v6 = AGAV08 TL.av1_binloopv2 = TLAV1 

I'm sure I know how to return a string in all uppercase letters using string.upper() , but I'm sure there are several ways to remove it effectively . . Any help would be greatly appreciated. I am still slowly but surely learning regular expressions. Each tip is added to my notes for future reference.

To clarify, my examples above are not real lines. The actual line will look like this:

 AG.av08_binloop_v6 

With my desired result it looks like:

 AGAV08 

And the following example will be the same. Line:

 TL.av1_binloopv2 

Required Conclusion:

 TLAV1 

Thanks again for your help!

+7
python string regex
source share
6 answers

Try the following:

 re.sub("[^AZ\d]", "", re.search("^[^_]*", str).group(0).upper()) 
+7
source share

Even without re :

 text.split('_', 1)[0].replace('.', '').upper() 
+20
source share

You do not need to use re for this. Simple string operations will suffice based on your requirements:

 tests = """ AG.av08_binloop_v6 = AGAV08 TL.av1_binloopv2 = TLAV1 """ for t in tests.splitlines(): print t[:t.find('_')].replace('.', '').upper() # Returns: # AGAV08 # TLAV1 

Or, if you absolutely must use re :

 import re pat = r'([a-zA-Z0-9.]+)_.*' pat_re = re.compile(pat) for t in tests.splitlines(): print re.sub(r'\.', '', pat_re.findall(t)[0]).upper() # Returns: # AGAV08 # TLAV1 
+2
source share

Since everyone gives their favorite implementation, here is mine that doesn't use re :

 >>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'): ... print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper() ... AGAV08 TLAV1 

I put .upper() on the outside of the generator, so it is called only once.

+2
source share

It, just for fun, is another option for getting text before the first underline:

 before_underscore, sep, after_underscore = str.partition('_') 

Thus, everything in one line can be:

 re.sub("[^AZ\d]", "", str.partition('_')[0].upper()) 
+2
source share

import re

re.sub ("[^ AZ \ d]", "", yourstr.split ('_', 1) [0] .upper ())

+1
source share

All Articles