Python - regular expression - line break before word

I am trying to split a string in python before a specific word. For example, I would like to split the following line before "path:" .

  • split the line to "path:"
  • input: "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
  • output: ['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

I tried

 rx = re.compile("(:?[^:]+)") rx.findall(line) 

This does not break the string anywhere. The problem is that the values โ€‹โ€‹after "path:" will never be known to indicate the whole word. Does anyone know how to do this?

+2
source share
4 answers

using a regex to split your string seems a bit overkill: the split() method may be just what you need.

anyway, if you really need to match the regular expression to split your string, you should use the re.split() method, which breaks the string into regular expression matching.

also use the correct regular expression for splitting:

 >>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism' >>> re.split(' (?=path:)', line) ['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism'] 

A group (?=...) is a statement in the form: an expression matches a space (note the space at the beginning of the expression), followed by the string 'path:' , without consuming what follows the space.

+4
source

You can do ["path:"+s for s in line.split("path:")[1:]] instead of using a regular expression. (note that we skip the first match that does not have the path: prefix.

+2
source
  in_str = "path: bte00250 Alanine, aspartate and glutamate metabolism path: bte00330 Arginine and proline metabolism"
 in_list = in_str.split ('path:')
 print ", path:". join (in_list) [1:]
0
source

This can be done without regular expressions. Given the line:

 s = "path:bte00250 Alanine, aspartate ... path:bte00330 Arginine and ..." 

We can temporarily replace the search word with a placeholder. Placeholder is the only character that we use to divide by:

 word, placeholder = "path:", "|" s = s.replace(word, placeholder).split(placeholder) s # ['', 'bte00250 Alanine, aspartate ... ', 'bte00330 Arginine and ...'] 

Now that the string is split, we can append the original word to each substring using a list comprehension:

 ["".join([word, i]) for i in s if i] # ['path:bte00250 Alanine, aspartate ... ', 'path:bte00330 Arginine and ...'] 
0
source

All Articles