Separating strings in the required format, Pythonic? (with or without regex)

Question

Separating strings in the required format, Pythonic? (with or without regex)

I have a line in the format:

t='@abc @def Hello this part is text'

I want to get the following:

 l=["abc", "def"] s='Hello this part is text'

I have done this:

 a=t[t.find(' ',t.rfind('@')):].strip() s=t[:t.find(' ',t.rfind('@'))].strip() b=a.split('@') l=[i.strip() for i in b][1:]

It works for the most part, but it fails when the text part has an "@". For example, when:

 t='@abc @def My email is red@hjk.com '

he fails. @names is at the beginning and there may be text after @names, which may contain @.

It is clear that I can add a space first and find out the first word without '@'. But this does not look like an elegant solution.

What is the pythonic way to resolve this issue?

+6

python string regex format

Lakshman prasad Feb 17 '09 at 18:22

source share

7 answers

 t='@abc @def Hello this part is text' words = t.split(' ') names = [] while words: w = words.pop(0) if w.startswith('@'): names.append(w[1:]) else: break text = ' '.join(words) print names print text

+7

Ricardo reyes Feb 17 '09 at 19:32

source share

How about this:

Space Separation.
for each word, check
2.1. if the word starts with @, then click on the first list
2.2. otherwise just attach the remaining words with spaces.

+5

Osama al-maadeed Feb 17 '09 at 19:03

source share

  [i.strip('@') for i in t.split(' ', 2)[:2]] # for a fixed number of @def a = [i.strip('@') for i in t.split(' ') if i.startswith('@')] s = ' '.join(i for i in t.split(' ') if not i.startwith('@'))

+3

Silentghost Feb 17 '09 at 18:37

source share

You can also use regular expressions:

 import re rx = re.compile("@([\w]+) @([\w]+) (.*)") t='@abc @def Hello this part is text and my email is foo@ba.r ' a,b,s = rx.match(t).groups()

But it all depends on how your data might look. So you may have to adjust it. What he does is basically creating group via () and checking what is allowed in them.

+3

MrTopf Feb 17 '09 at 18:40

source share

[ edit : this implements what Osama suggested above]

This will create L based on @ variables at the beginning of the line, and then, as soon as non-var is found, just take the rest of the line.

 t = '@one @two @three some text afterward with @ symbols@ meow@meow ' words = t.split(' ') # split into list of words based on spaces L = [] s = '' for i in range(len(words)): # go through each word word = words[i] if word[0] == '@': # grab @ from beginning of string L.append(word[1:]) continue s = ' '.join(words[i:]) # put spaces back in break # you can ignore the rest of the words

You can reorganize this to be less code, but I'm trying to make what happens obvious.

+3

Jason coon Feb 17 '09 at 19:21

source share

Here is another option that uses split () and no regexpes:

 t='@abc @def My email is red@hjk.com ' tags = [] words = iter(t.split()) # iterate over words until first non-tag word for w in words: if not w.startswith("@"): # join this word and all the following s = w + " " + (" ".join(words)) break tags.append(w[1:]) else: s = "" # handle string with only tags print tags, s

Here's a shorter, but perhaps a bit cryptic version that uses a regular expression to find the first space, followed by a non @ character:

 import re t = '@abc @def My email is red@hjk.com @extra bye' m = re.search(r"\s([^@].*)$", t) tags = [tag[1:] for tag in t[:m.start()].split()] s = m.group(1) print tags, s # ['abc', 'def'] My email is red@hjk.com @extra bye

This does not work properly if there are no tags or no text. The format is not specified. You will need to provide more test cases to verify.

+1

Martin Vilcans Feb 18 '09 at 23:14

source share

Brent.Longborough · Accepted Answer · 2009-02-17T19:32:42+0000

Directly create MrTopf efforts:

 import re rx = re.compile("((?:@\w+ +)+)(.*)") t='@abc @def @xyz Hello this part is text and my email is foo@ba.r ' a,s = rx.match(t).groups() l = re.split('[@ ]+',a)[1:-1] print l print s

prints:

['abc', 'def', 'xyz']
Hi, this part is the text and my email address is: foo@ba.r

Just by calling to the hasen j account, let me explain how this works:

 /@\w+ +/

matches one tag - @ followed by at least one alphanumeric or _ followed by at least one whitespace character. + greedy, so if there is more than one place, he will capture them all.

To match any number of these tags, we need to add a plus (one or more things) to the template for the tag; so we need to group it with parentheses:

 /(@\w+ +)+/

which matches one or more tags and, being greedy, matches all of them. However, these parentheses now work with our capture groups, so we undo this by turning them into an anonymous group:

 /(?:@\w+ +)+/

Finally, we do this in the capture group and add another to raise the rest:

 /((?:@\w+ +)+)(.*)/

Last breakdown to take stock:

 ((?:@\w+ +)+)(.*) (?:@\w+ +)+ ( @\w+ +) @\w+ +

Note that when considering this, I improved it - \ w did not need to be in the set, and now it allows you to use several spaces between tags. Thank hasen-j

Separating strings in the required format, Pythonic? (with or without regex)

More articles: