Modular arithmetic in python to iterate pandas dataframe

Question

Modular arithmetic in python to iterate pandas dataframe

Ok, I have a big dataframe, for example:

      hour    value
  0      0      1
  1      6      2
  2     12      3
  3     18      4
  4      0      5
  5      6      6
  6     12      7
  7     18      8
  8      6      9
  9     12     10
 10     18     11
 11     12     12
 12     18     13
 13      0     14

Do not get lost here. The column hourrepresents the hours of the day, from 6 to 6 hours. The column is valuesgood, for sure, that here the values are given as an example, and not actual.

If you look closely at the column hour, you will see that the clock is missing. For example, there is a gap between lines 7 and 8 (there is no hour value 0). There are also large gaps, for example, between rows 10 and 11 (hours 00 and 06).

What I need? I would like to check when the hour (and, of course,) is missing, and fill in the DataFrame by inserting a row with the corresponding hour and np.nanas a value.

? , , 24, , 18 + 6 = 24 = 0 mod 24. , 6 , 24, , hour , , np.nan .

, python dataframe.

.

+4

python numpy pandas dataframe

David 19 '16 15:58

2

, , . ( ), .

def hour_checker(hours, values):
    def check_hour(hour):
        if hour not in (0, 6, 12, 18):
            raise ValueError('Invalid hour')
    [check_hour(hour) for hour in hours]
    result = []
    valid_hours = np.arange(0, 24, 6)
    while valid_hours[-1] != hour:
        # Initialize.
        valid_hours = np.roll(valid_hours, -1)
        result.append([hours.iat[0], values.iat[0]])
    for hour, value in zip(hours.iloc[1:], values.iloc[1:]):
        while hour != valid_hours[0]:
            result.append([valid_hours[0], None])
            valid_hours = np.roll(valid_hours, -1)
        result.append([hour, value])
        valid_hours = np.roll(valid_hours, -1)
    return pd.DataFrame(result, columns=['hour', 'value'])

hour_checker(df['hour'], df['value'])
Out[33]: 
    hour  value
0      0      1
1      6      2
2     12      3
3     18      4
4      0      5
5      6      6
6     12      7
7     18      8
8      0    NaN
9      6      9
10    12     10
11    18     11
12     0    NaN
13     6    NaN
14    12     12
15    18     13
16     0     14

df_test = pd.concat([df] * 100)

%%timeit
group_hours = (df_test.hour <= df_test.hour.shift()).cumsum()
df_test.groupby(group_hours).apply(insert_missing_hours).reset_index(drop=1)
1 loops, best of 3: 611 ms per loop

%timeit hour_checker(df_test['hour'], df_test['value'])
100 loops, best of 3: 12.4 ms per loop

+4

Alexander 19 '16 17:21

piRSquared · Accepted Answer · 2016-05-19T16:24:04+0000

group_hours = (df.hour <= df.hour.shift()).cumsum()

def insert_missing_hours(df):
    return df.set_index('hour').reindex([0, 6, 12, 18]).reset_index()

df.groupby(group_hours).apply(insert_missing_hours).reset_index(drop=1)

:

    hour  value
0      0    1.0
1      6    2.0
2     12    3.0
3     18    4.0
4      0    5.0
5      6    6.0
6     12    7.0
7     18    8.0
8      0    NaN
9      6    9.0
10    12   10.0
11    18   11.0
12     0    NaN
13     6    NaN
14    12   12.0
15    18   13.0
16     0   14.0
17     6    NaN
18    12    NaN
19    18    NaN

reindex, , . , , . , .

insert_missing_hours - reindex [0, 6, 12, 18].

Modular arithmetic in python to iterate pandas dataframe

More articles: