Calculate average and median efficiently

Question

Calculate average and median efficiently

What is the most efficient way to sequentially search for the average and average number of rows in a Python list?

For example, my list:

input_list = [1,2,4,6,7,8]

I want to create a list of results containing:

 output_list_mean = [1,1.5,2.3,3.25,4,4.7] output_list_median = [1,1.5,2.0,3.0,4.0,5.0]

If the average is calculated as follows:

1 = average (1)
1.5 = average (1,2) (i.e. average of the first 2 values in input_list)
2.3 = average (1,2,4) (i.e. average of the first 3 values in input_list)
3.25 = average (1,2,4,6) (i.e. the average of the first 4 values in input_list), etc.

And the median is calculated as follows:

1 = median (1)
1.5 = median (1,2) (i.e. median of the first 2 values in input_list)
2.0 = median (1,2,4) (i.e. median of the first 3 values in input_list)
3.0 = median (1,2,4,6) (i.e. median of the first 4 values in input_list), etc.

I tried to implement it with the next cycle, but it seems very inefficient.

 import numpy input_list = [1,2,4,6,7,8] for item in range(1,len(input_list)+1): print(numpy.mean(input_list[:item])) print(numpy.median(input_list[:item]))

+6

performance python numpy median mean

hoof_hearted Jul 12 '15 at 16:50

source share

3 answers

Jaime · Answer 1 · 2015-07-12T18:10:52+0000

Everything that you do yourself, especially with the environment, will either require a lot of work or be very inefficient, but Pandas comes with built-in effective implementations of the functions you perform, O (n), the expanding median is O (n * log (n)) using a skip list:

 import pandas as pd import numpy as np input_list = [1, 2, 4, 6, 7, 8] >>> pd.expanding_mean(np.array(input_list)) array([ 1. , 1.5 , 2.33333, 3.25 , 4. , 4.66667]) >>> pd.expanding_median(np.array(input_list)) array([ 1. , 1.5, 2. , 3. , 4. , 5. ])

Kasramvd · Answer 2 · 2015-07-12T16:58:37+0000

You can use itertools.islice to slice an array and use np.fromiter with np.mean :

 >>> arr=np.array([1,2,4,6,7,8]) >>> l=arr.size >>> from itertools import islice >>> [np.fromiter(islice(arr,0,i+1),float).mean(dtype=np.float32) for i in xrange(l)] [1.0, 1.5, 2.3333333, 3.25, 4.0, 4.6666665]

As an alternative answer to you, if you want to get the average value, you can use np.cumsum to get the cumulative sum of your elements and split the main array with np.true_divide :

 >>> np.true_divide(np.cumsum(arr),arr) array([ 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

wwii · Answer 3 · 2015-07-12T18:37:23+0000

 import numpy as np a = np.array([1,2,4,6,7,8])

Use numpy.meshgrid (there are other formulations that work) and numpy.triu to create an array with values that interest you.

 x, y = np.meshgrid(a,a) # y = a.repeat(len(a)).reshape(len(a), len(a)) c = np.triu(y) >>> y array([[1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [4, 4, 4, 4, 4, 4], [6, 6, 6, 6, 6, 6], [7, 7, 7, 7, 7, 7], [8, 8, 8, 8, 8, 8]]) >>> c array([[1, 1, 1, 1, 1, 1], [0, 2, 2, 2, 2, 2], [0, 0, 4, 4, 4, 4], [0, 0, 0, 6, 6, 6], [0, 0, 0, 0, 7, 7], [0, 0, 0, 0, 0, 8]])

Define a function that returns the median of all nonzero values and applies it to your interesting array.

 def foo(a): '''return the the median of the non-zero elements of a 1d array ''' return np.median(a[a.nonzero()]) d = np.apply_along_axis(foo, 0, c) >>> d array([ 1. , 1.5, 2. , 3. , 4. , 5. ]) >>>

Calculate average and median efficiently

More articles: