Python pandas: how to calculate derivative / gradient

Given that I have the following two vectors:

In [99]: time_index Out[99]: [1484942413, 1484942712, 1484943012, 1484943312, 1484943612, 1484943912, 1484944212, 1484944511, 1484944811, 1484945110] In [100]: bytes_in Out[100]: [1293981210388, 1293981379944, 1293981549960, 1293981720866, 1293981890968, 1293982062261, 1293982227492, 1293982391244, 1293982556526, 1293982722320] 

Where bytes_in is an incremental counter only, and time_index is a list of unix timestamps (era).

Goal: What I would like to calculate is the bitrate.

This means that I will create a data frame, for example

 In [101]: timeline = pandas.to_datetime(time_index, unit="s") In [102]: recv = pandas.Series(bytes_in, timeline).resample("300S").mean().ffill().apply(lambda i: i*8) In [103]: recv Out[103]: 2017-01-20 20:00:00 10351849683104 2017-01-20 20:05:00 10351851039552 2017-01-20 20:10:00 10351852399680 2017-01-20 20:15:00 10351853766928 2017-01-20 20:20:00 10351855127744 2017-01-20 20:25:00 10351856498088 2017-01-20 20:30:00 10351857819936 2017-01-20 20:35:00 10351859129952 2017-01-20 20:40:00 10351860452208 2017-01-20 20:45:00 10351861778560 Freq: 300S, dtype: int64 

Question: Now, oddly, calculating the gradient manually gives me:

 In [104]: (bytes_in[1]-bytes_in[0])*8/300 Out[104]: 4521.493333333333 

which is the correct value.

when calculating the gradient using pandas gives me

 In [124]: recv.diff() Out[124]: 2017-01-20 20:00:00 NaN 2017-01-20 20:05:00 1356448.0 2017-01-20 20:10:00 1360128.0 2017-01-20 20:15:00 1367248.0 2017-01-20 20:20:00 1360816.0 2017-01-20 20:25:00 1370344.0 2017-01-20 20:30:00 1321848.0 2017-01-20 20:35:00 1310016.0 2017-01-20 20:40:00 1322256.0 2017-01-20 20:45:00 1326352.0 Freq: 300S, dtype: float64 

which is not the same as above, 1356448.0 is different from 4521.49333333333333

Could you tell me what I am doing wrong?

+11
python pandas data-analysis
source share
3 answers

pd.Series.diff() accepts only differences. It also does not divide by delta index.

This will give you an answer.

 recv.diff() / recv.index.to_series().diff().dt.total_seconds() 2017-01-20 20:00:00 NaN 2017-01-20 20:05:00 4521.493333 2017-01-20 20:10:00 4533.760000 2017-01-20 20:15:00 4557.493333 2017-01-20 20:20:00 4536.053333 2017-01-20 20:25:00 4567.813333 2017-01-20 20:30:00 4406.160000 2017-01-20 20:35:00 4366.720000 2017-01-20 20:40:00 4407.520000 2017-01-20 20:45:00 4421.173333 Freq: 300S, dtype: float64 

You can also use numpy.gradient , passing the bytes_in and delta that you expect. This does not reduce the length by one, but makes assumptions about the edges.

 np.gradient(bytes_in, 300) * 8 array([ 4521.49333333, 4527.62666667, 4545.62666667, 4546.77333333, 4551.93333333, 4486.98666667, 4386.44 , 4387.12 , 4414.34666667, 4421.17333333]) 
+12
source share

A naive explanation would be that diff literally subtracts the following entries, while np.gradient uses a central difference scheme.

0
source share

Since there is no built-in derivative method in Pandas Series / DataFrame, you can use https://github.com/scls19fr/pandas-helper-calc .

It will provide a new access method called calc for the Pandas Series and DataFrames for calculating numerical derivatives and integrals.

So you can just do

 recv.calc.derivative() 

This is using diff() under the hood.

0
source share

All Articles