I am trying to figure out how I can apply cumulative functions to objects. There are several alternatives for numbers, such as cumsum and cumcount . There is also df.expanding which can be used with apply . But the functions that I pass to apply do not work with objects.
import pandas as pd df = pd.DataFrame({"C1": [1, 2, 3, 4], "C2": [{"A"}, {"B"}, {"C"}, {"D"}], "C3": ["A", "B", "C", "D"], "C4": [["A"], ["B"], ["C"], ["D"]]}) df Out: C1 C2 C3 C4 0 1 {A} A [A] 1 2 {B} B [B] 2 3 {C} C [C] 3 4 {D} D [D]
In the dataframe, I have integer values, sets, rows and lists. Now, if I try expanding().apply(sum) , I have the total amount:
df.expanding().apply(sum) Out[69]: C1 C2 C3 C4 0 1.0 {A} A [A] 1 3.0 {B} B [B] 2 6.0 {C} C [C] 3 10.0 {D} D [D]
My expectation was, since the summation is defined in lists and rows, I would get something like the following:
C1 C2 C3 C4 0 1.0 {A} A [A] 1 3.0 {B} AB [A, B] 2 6.0 {C} ABC [A, B, C] 3 10.0 {D} ABCD [A, B, C, D]
I also tried something like this:
df.expanding().apply(lambda r: reduce(lambda x, y: x+y**2, r)) Out: C1 C2 C3 C4 0 1.0 {A} A [A] 1 5.0 {B} B [B] 2 14.0 {C} C [C] 3 30.0 {D} D [D]
It works as I expect: the previous result is x , and the current value of the string is y . But I can not reduce the use of x.union(y) , for example.
So my question is: are there any expanding alternatives that I can use for objects? This example shows that expanding().apply() does not work with dtypes objects. I am looking for a general solution that supports applying functions to these two inputs: the previous result and the current element.