Modular arithmetic in python to iterate pandas dataframe

Ok, I have a big dataframe, for example:

      hour    value
  0      0      1
  1      6      2
  2     12      3
  3     18      4
  4      0      5
  5      6      6
  6     12      7
  7     18      8
  8      6      9
  9     12     10
 10     18     11
 11     12     12
 12     18     13
 13      0     14

Do not get lost here. The column hourrepresents the hours of the day, from 6 to 6 hours. The column is valuesgood, for sure, that here the values ​​are given as an example, and not actual.

If you look closely at the column hour, you will see that the clock is missing. For example, there is a gap between lines 7 and 8 (there is no hour value 0). There are also large gaps, for example, between rows 10 and 11 (hours 00 and 06).

What I need? I would like to check when the hour (and, of course,) is missing, and fill in the DataFrame by inserting a row with the corresponding hour and np.nanas a value.

? , , 24, , 18 + 6 = 24 = 0 mod 24. , 6 , 24, , hour , , np.nan .

, python dataframe.

.

+4
2

group_hours = (df.hour <= df.hour.shift()).cumsum()

def insert_missing_hours(df):
    return df.set_index('hour').reindex([0, 6, 12, 18]).reset_index()

df.groupby(group_hours).apply(insert_missing_hours).reset_index(drop=1)

:

    hour  value
0      0    1.0
1      6    2.0
2     12    3.0
3     18    4.0
4      0    5.0
5      6    6.0
6     12    7.0
7     18    8.0
8      0    NaN
9      6    9.0
10    12   10.0
11    18   11.0
12     0    NaN
13     6    NaN
14    12   12.0
15    18   13.0
16     0   14.0
17     6    NaN
18    12    NaN
19    18    NaN

reindex, , . , , . , .

insert_missing_hours - reindex [0, 6, 12, 18].

+6

, , . ( ), .

def hour_checker(hours, values):
    def check_hour(hour):
        if hour not in (0, 6, 12, 18):
            raise ValueError('Invalid hour')
    [check_hour(hour) for hour in hours]
    result = []
    valid_hours = np.arange(0, 24, 6)
    while valid_hours[-1] != hour:
        # Initialize.
        valid_hours = np.roll(valid_hours, -1)
        result.append([hours.iat[0], values.iat[0]])
    for hour, value in zip(hours.iloc[1:], values.iloc[1:]):
        while hour != valid_hours[0]:
            result.append([valid_hours[0], None])
            valid_hours = np.roll(valid_hours, -1)
        result.append([hour, value])
        valid_hours = np.roll(valid_hours, -1)
    return pd.DataFrame(result, columns=['hour', 'value'])

hour_checker(df['hour'], df['value'])
Out[33]: 
    hour  value
0      0      1
1      6      2
2     12      3
3     18      4
4      0      5
5      6      6
6     12      7
7     18      8
8      0    NaN
9      6      9
10    12     10
11    18     11
12     0    NaN
13     6    NaN
14    12     12
15    18     13
16     0     14

df_test = pd.concat([df] * 100)

%%timeit
group_hours = (df_test.hour <= df_test.hour.shift()).cumsum()
df_test.groupby(group_hours).apply(insert_missing_hours).reset_index(drop=1)
1 loops, best of 3: 611 ms per loop

%timeit hour_checker(df_test['hour'], df_test['value'])
100 loops, best of 3: 12.4 ms per loop
+4

All Articles