This should lead to some acceleration. Define the Pearson function modified from the documents in the Primer link:
def Pearson(r, n=len(dat)): r = max(min(r, 1.0), -1.0) df = n - 2 if abs(r) == 1.0: prob = 0.0 else: t_squared = r**2 * (df / ((1.0 - r) * (1.0 + r))) prob = betai(0.5*df, 0.5, df/(df+t_squared)) return (r,prob)
Use applymap , which performs basic operations on dat.corr . You pass the correlation coefficient r to Pearson :
np.random.seed(10) dat = pd.DataFrame(np.random.randn(5, 5)) dat[0] = np.arange(5)
You see acceleration with this method when the dat is large, but it is still quite slow due to elementary operations.
np.random.seed(10) dat = pd.DataFrame(np.random.randn(100, 100)) %%timeit dat.corr().applymap(Pearson) 10 loops, best of 3: 118 ms per loop %%timeit stats = dict() for l in combinations(dat.index.tolist(),2): stats[l] = pearsonr(dat.loc[l[0],:], dat.loc[l[1],:]) 1 loops, best of 3: 1.56 s per loop
Kevin source share