Pandas: create a new column by linearly interpolating between existing columns

Let's say I have a DataFrame that contains temperature data at different altitudes on a mountain, each of which is sampled once a day. The height of each probe is fixed (i.e. they remain constant every day) and is known. Each row represents a different timestamp, and I have a separate column for recording the temperature observed by each probe. I also have a column ( targ_alt) that contains the "heights of interest" for each row.

My goal is to add a new column with a name intreped_tempthat contains for each row the temperature you get for that row targ_altby linear interpolation between the temperatures of the probes at their known heights. What is the best way to do this?

Here is some installation code, so we can look at the same context:

import pandas as pd
import numpy as np

np.random.seed(1)

n = 10
probe_alts = {'base': 1000, 'mid': 2000, 'peak': 3500}
# let make the temperatures decrease at higher altitudes...just for style
temp_readings = {k: np.random.randn(n) + 15 - v/300 for k, v in probe_alts.items()}
df = pd.DataFrame(temp_readings)

targ_alt = 2000 + (500 * np.random.randn(n))
df['targ_alt'] = targ_alt

So dfit looks like this:

        base        mid      peak     targ_alt
0  13.624345  10.462108  2.899381  1654.169624
1  11.388244   6.939859  5.144724  1801.623237
2  11.471828   8.677583  4.901591  1656.413650
3  10.927031   8.615946  4.502494  1577.397179
4  12.865408  10.133769  4.900856  1664.376935
5   9.698461   7.900109  3.316272  1993.667701
6  13.744812   8.827572  3.877110  1441.344826
7  11.238793   8.122142  3.064231  2117.207849
8  12.319039   9.042214  3.732112  2829.901089
9  11.750630   9.582815  4.530355  2371.022080
+4
source share
2 answers

In the above example, I wanted to interpret a different x coordinate on each line. Good. If you don’t ... if you want to translate the same x coordinate on each line, using SciPy will be incredibly time-saving. See the example below:

import numpy as np
import pandas as pd
from scipy.interpolate import interp1d

np.random.seed(1)
n = 10e4

df = pd.DataFrame({'a': np.random.randn(n), 
                   'b': 10 + np.random.randn(n), 
                   'c': 30 + np.random.randn(n)})

xs = [-10, 0, 10]
cvs = df.columns.values

3 , x- 5:

%timeit df['n1'] = df.apply(lambda row: np.interp(5, xs, row[cvs]), axis=1)
%timeit df['n2'] = df.apply(lambda row: np.interp(5, xs, tuple([row[j] for j in cvs])), axis=1)
%timeit df['n3'] = interp1d(xs, df[cvs])(5)

n = 1e2:

100 loops, best of 3: 13.2 ms per loop
1000 loops, best of 3: 1.24 ms per loop
1000 loops, best of 3: 488 µs per loop

n = 1e4:

1 loops, best of 3: 1.33 s per loop
10 loops, best of 3: 109 ms per loop
1000 loops, best of 3: 798 µs per loop

n = 1e6:

# first one is too slow to wait for
1 loops, best of 3: 10.9 s per loop
10 loops, best of 3: 58.3 ms per loop

: , - min-max ?

+2

Ok. , , . - , zip np.interp map? (. . , DataFrame.apply...) I.e. Pandas, map Series, DataFrame ( groupby)?

, :

df['rolled'] = zip(df['targ_alt'], zip(df['base'], df['mid'], df['peak']))
%timeit df['interped_temp'] = df['rolled'].map(lambda x: np.interp(x[0], probe_alts.values(), x[1]))
del df['rolled']

:

        base        mid      peak     targ_alt  interped_temp
0  13.624345  10.462108  2.899381  1654.169624      11.555706
1  11.388244   6.939859  5.144724  1801.623237       7.822315
2  11.471828   8.677583  4.901591  1656.413650       9.637647
3  10.927031   8.615946  4.502494  1577.397179       9.592617
4  12.865408  10.133769  4.900856  1664.376935      11.050570
5   9.698461   7.900109  3.316272  1993.667701       7.911496
6  13.744812   8.827572  3.877110  1441.344826      11.574613
7  11.238793   8.122142  3.064231  2117.207849       7.726924
8  12.319039   9.042214  3.732112  2829.901089       6.104308
9  11.750630   9.582815  4.530355  2371.022080       8.333099

n=10, %timeit 182us/loop. n=1e6, %timeit 4.51s/loop. .


@DSM , probe_alts.values() . - , :

probes = ['base', 'mid', 'peak']
df['rolled'] = zip(df['targ_alt'], zip(*[df[p] for p in probes]))
df['interped_temp'] = df['rolled'].map(lambda x: np.interp(x[0], tuple(probe_alts[p] for p in probes), x[1]))
del df['rolled']

, , DataFrame.apply, ...

probes = ['base', 'mid', 'peak']
def cust_interp(row):
    return np.interp(row['targ_alt'], tuple(probe_alts[p] for p in probes), row[probes])
df['interped_temp'] = df.apply(cust_interp, axis=1)
+1

All Articles