Python string parsing
Divide the line into a space, get the list, show its type, print it:
el@apollo:~/foo$ python >>> mystring = "What does the fox say?" >>> mylist = mystring.split(" ") >>> print type(mylist) <type 'list'> >>> print mylist ['What', 'does', 'the', 'fox', 'say?']
If you have two delimiters next to each other, an empty string is assumed:
el@apollo:~/foo$ python >>> mystring = "its so fluffy im gonna DIE!!!" >>> print mystring.split(" ") ['its', '', 'so', '', '', 'fluffy', '', '', 'im', 'gonna', '', '', '', 'DIE!!!']
Divide the line by underlining and grab the 5th item in the list:
el@apollo:~/foo$ python >>> mystring = "Time_to_fire_up_Kowalski's_Nuclear_reactor." >>> mystring.split("_")[4] "Kowalski's"
Collapse multiple spaces into one
el@apollo:~/foo$ python >>> mystring = 'collapse these spaces' >>> mycollapsedstring = ' '.join(mystring.split()) >>> print mycollapsedstring.split(' ') ['collapse', 'these', 'spaces']
When you do not pass the Python split method parameter, the docs say : "runs of consecutive spaces are treated as one separator, and the result will not contain empty lines at the beginning or at the end if the line has leading or trailing spaces."
Hold on to your boy boys, parse the regex:
el@apollo:~/foo$ python >>> mystring = 'zzzzzzabczzzzzzdefzzzzzzzzzghizzzzzzzzzzzz' >>> import re >>> mylist = re.split("[am]+", mystring) >>> print mylist ['zzzzzz', 'zzzzzz', 'zzzzzzzzz', 'zzzzzzzzzzzz']
The regular expression "[am] +" means that lowercase letters a through m that occur one or more times are matched as a delimiter. re is the library to be imported.
Or, if you want to rob elements one at a time:
el@apollo:~/foo$ python >>> mystring = "theres coffee in that nebula" >>> mytuple = mystring.partition(" ") >>> print type(mytuple) <type 'tuple'> >>> print mytuple ('theres', ' ', 'coffee in that nebula') >>> print mytuple[0] theres >>> print mytuple[2] coffee in that nebula