Incorrect results when applying the solution to real data

I tried to apply the solution presented in this question to my real data: Selecting rows in a multi-indexed framework . For some reason I can’t get the results that he should give. I attached both a data frame for selection, as well as the result.

What I need,

Lines 3, 11, and 12 should be returned (when you add 4 columns in sequence, 12 should also be selected. This is not now).

df_test = pd.read_csv('df_test.csv') def find_window(df): v = df.values s = np.vstack([np.zeros((1, v.shape[1])), v.cumsum(0)]) threshold = 0 r, c = np.triu_indices(s.shape[0], 1) d = (c - r)[:, None] e = s[c] - s[r] mask = (e / d < threshold).all(1) rng = np.arange(mask.shape[0]) if mask.any(): idx = rng[mask][d[mask].argmax()] i0, i1 = r[idx], c[idx] return pd.DataFrame( v[i0:i1], df.loc[df.name].index[i0:i1], df.columns ) cols = ['2012', '2013', '2014', '2015'] df_test.groupby(level=0)[cols].apply(find_window) 

csv_file is here: https://docs.google.com/spreadsheets/d/19oOoBdAs3xRBWq6HReizlqrkWoQR2159nk8GWoR_4-g/edit?usp=sharing

EDIT: correct data added. enter image description here

enter image description here

Note: blue frame = rows to be returned, yellow frames are consecutive column values, 0 (threshold).

+8
numpy pandas dataframe
source share
1 answer

According to the logic of your comment, you are looking for rows that have each value in columns 2012,2013,2014,2015 less than 0 or have a total sum of less than 0. Since the first condition will always be true, if the second condition is true, you simply check the second condition.

 cols = ['2012', '2013', '2014', '2015'] df.loc[(df[cols].cumsum(axis=1) < 0).all(axis=1), cols] 2012 2013 2014 2015 1 -6.74 -1.22 1.58 -0.42 3 -3.14 -2.48 -0.02 -4.78 4 -9.40 -11.20 0.68 12.04 7 -3.12 -5.74 0.84 1.94 8 -10.14 -12.24 -11.10 15.20 11 -10.04 -10.60 -5.56 -8.44 12 -7.30 5.96 -12.58 -6.86 15 -10.24 -4.16 5.46 -14.00 

Let me know in the comments if this is not what you want.

0
source share

All Articles