I have a DataFrame in pandas with a column named df.strings with lines of text. I would like to get individual words of these rows in my rows with the same meanings for other columns. For example, if I have 3 rows (and an unrelated "Time" column):
Strings Time 0 The dog 4Pm 1 lazy dog 2Pm 2 The fox 1Pm
I want newlines to contain words from a string, but otherwise identical columns
Strings --- Words ---Time "The dog" --- "The" --- 4Pm "The dog" --- "dog" --- 4Pm "lazy dog"--- "lazy"--- 2Pm "lazy dog"--- "dog" --- 2Pm "The fox" --- "The" --- 1Pm "The fox" --- "fox" --- 1Pm
I know how to break words into lines:
string_list = '\n'.join(df.Strings.map(str)) word_list = re.findall('[az]+', Strings)
But how can I get them in a dataframe while keeping the index and other variables? I am using Python 2.7 and pandas 0.10.1.
EDIT: Now I understand how to expand the strings with groupby found in this question :
def f(group): row = group.irow(0) return DataFrame({'words': re.findall('[az]+',row['Strings'])}) df.groupby('class', group_keys=False).apply(f)
I would like to keep the other columns. Is it possible?
python pandas
Kyle heuton
source share