Please note that this is not a question of multiple regression, it is a question of repeating several times the regression of a simple (single variable) in Python / NumPy (2.7).
I have two arrays mxn x and y . The lines correspond to each other, and each pair is a set of (x, y) points for measurement. That is, plt.plot(xT, yT, '.') Will display each of the m datasets / dimensions.
I am wondering what is the best way to perform m linear regressions. I am currently scipy.stats.linregress() through strings and using scipy.stats.linregress() . (Suppose I do not want solutions based on linear algebra with matrices, but instead want to work with this function or the equivalent black box function.) I could try np.vectorize , but the docs point out that these are also loops.
In some experiments, I also found a way to use list methods with map() and get the correct results. I put both solutions below. IPython returns `%% timeit`` using a small data set (commented out):
(loop) 1000 loops, best of 3: 642 ยตs per loop (map) 1000 loops, best of 3: 634 ยตs per loop
To try to increase this, I made a much larger random dataset ( trials x trials size):
(loop, trials = 1000) 1 loops, best of 3: 299 ms per loop (loop, trials = 10000) 1 loops, best of 3: 5.64 s per loop (map, trials = 1000) 1 loops, best of 3: 256 ms per loop (map, trials = 10000) 1 loops, best of 3: 2.37 s per loop
This is a decent acceleration on a really large set, but I expected a bit more. Is there a better way?
import numpy as np import matplotlib.pyplot as plt import scipy.stats as stats np.random.seed(42)