Here is my implementation, which is much, much faster and more complete than the other answers here. It has 4 separate subfunctions for different cases.
I just copy the docstring of the main str_split function:
str_split(s, *delims, empty=None)
Divide the string s into the rest of the arguments, possibly omitting the empty parts (the empty keyword argument is responsible for this). This is a generator function.
When only one delimiter is provided, the string is simply split by it. empty by default True .
str_split('[]aaa[][]bb[c', '[]') -> '', 'aaa', '', 'bb[c' str_split('[]aaa[][]bb[c', '[]', empty=False) -> 'aaa', 'bb[c'
When multiple delimiters are specified, the string is split into the longest possible sequences of these delimiters by default, or if empty set to True , empty strings between delimiters are also included. Please note that delimiters in this case can only be single characters.
str_split('aaa, bb : c;', ' ', ',', ':', ';') -> 'aaa', 'bb', 'c' str_split('aaa, bb : c;', *' ,:;', empty=True) -> 'aaa', '', 'bb', '', '', 'c', ''
If no separators are specified, string.whitespace used, so the effect is the same as str.split() , except that this function is a generator.
str_split('aaa\\t bb c \\n') -> 'aaa', 'bb', 'c'
import string def _str_split_chars(s, delims): "Split the string `s` by characters contained in `delims`, including the \ empty parts between two consecutive delimiters" start = 0 for i, c in enumerate(s): if c in delims: yield s[start:i] start = i+1 yield s[start:] def _str_split_chars_ne(s, delims): "Split the string `s` by longest possible sequences of characters \ contained in `delims`" start = 0 in_s = False for i, c in enumerate(s): if c in delims: if in_s: yield s[start:i] in_s = False else: if not in_s: in_s = True start = i if in_s: yield s[start:] def _str_split_word(s, delim): "Split the string `s` by the string `delim`" dlen = len(delim) start = 0 try: while True: i = s.index(delim, start) yield s[start:i] start = i+dlen except ValueError: pass yield s[start:] def _str_split_word_ne(s, delim): "Split the string `s` by the string `delim`, not including empty parts \ between two consecutive delimiters" dlen = len(delim) start = 0 try: while True: i = s.index(delim, start) if start!=i: yield s[start:i] start = i+dlen except ValueError: pass if start<len(s): yield s[start:] def str_split(s, *delims, empty=None): """\ Split the string `s` by the rest of the arguments, possibly omitting empty parts (`empty` keyword argument is responsible for that). This is a generator function. When only one delimiter is supplied, the string is simply split by it. `empty` is then `True` by default. str_split('[]aaa[][]bb[c', '[]') -> '', 'aaa', '', 'bb[c' str_split('[]aaa[][]bb[c', '[]', empty=False) -> 'aaa', 'bb[c' When multiple delimiters are supplied, the string is split by longest possible sequences of those delimiters by default, or, if `empty` is set to `True`, empty strings between the delimiters are also included. Note that the delimiters in this case may only be single characters. str_split('aaa, bb : c;', ' ', ',', ':', ';') -> 'aaa', 'bb', 'c' str_split('aaa, bb : c;', *' ,:;', empty=True) -> 'aaa', '', 'bb', '', '', 'c', '' When no delimiters are supplied, `string.whitespace` is used, so the effect is the same as `str.split()`, except this function is a generator. str_split('aaa\\t bb c \\n') -> 'aaa', 'bb', 'c' """ if len(delims)==1: f = _str_split_word if empty is None or empty else _str_split_word_ne return f(s, delims[0]) if len(delims)==0: delims = string.whitespace delims = set(delims) if len(delims)>=4 else ''.join(delims) if any(len(d)>1 for d in delims): raise ValueError("Only 1-character multiple delimiters are supported") f = _str_split_chars if empty else _str_split_chars_ne return f(s, delims)
This function works in Python 3, and a lightweight, albeit rather ugly, fix can be applied to make it work in both versions 2 and 3. The first lines of the function should be changed to:
def str_split(s, *delims, **kwargs): """...docstring...""" empty = kwargs.get('empty')