Calculate average and median efficiently

What is the most efficient way to sequentially search for the average and average number of rows in a Python list?

For example, my list:

input_list = [1,2,4,6,7,8] 

I want to create a list of results containing:

 output_list_mean = [1,1.5,2.3,3.25,4,4.7] output_list_median = [1,1.5,2.0,3.0,4.0,5.0] 

If the average is calculated as follows:

  • 1 = average (1)
  • 1.5 = average (1,2) (i.e. average of the first 2 values โ€‹โ€‹in input_list)
  • 2.3 = average (1,2,4) (i.e. average of the first 3 values โ€‹โ€‹in input_list)
  • 3.25 = average (1,2,4,6) (i.e. the average of the first 4 values โ€‹โ€‹in input_list), etc.

And the median is calculated as follows:

  • 1 = median (1)
  • 1.5 = median (1,2) (i.e. median of the first 2 values โ€‹โ€‹in input_list)
  • 2.0 = median (1,2,4) (i.e. median of the first 3 values โ€‹โ€‹in input_list)
  • 3.0 = median (1,2,4,6) (i.e. median of the first 4 values โ€‹โ€‹in input_list), etc.

I tried to implement it with the next cycle, but it seems very inefficient.

 import numpy input_list = [1,2,4,6,7,8] for item in range(1,len(input_list)+1): print(numpy.mean(input_list[:item])) print(numpy.median(input_list[:item])) 
+6
source share
3 answers

Everything that you do yourself, especially with the environment, will either require a lot of work or be very inefficient, but Pandas comes with built-in effective implementations of the functions you perform, O (n), the expanding median is O (n * log (n)) using a skip list:

 import pandas as pd import numpy as np input_list = [1, 2, 4, 6, 7, 8] >>> pd.expanding_mean(np.array(input_list)) array([ 1. , 1.5 , 2.33333, 3.25 , 4. , 4.66667]) >>> pd.expanding_median(np.array(input_list)) array([ 1. , 1.5, 2. , 3. , 4. , 5. ]) 
+8
source

You can use itertools.islice to slice an array and use np.fromiter with np.mean :

 >>> arr=np.array([1,2,4,6,7,8]) >>> l=arr.size >>> from itertools import islice >>> [np.fromiter(islice(arr,0,i+1),float).mean(dtype=np.float32) for i in xrange(l)] [1.0, 1.5, 2.3333333, 3.25, 4.0, 4.6666665] 

As an alternative answer to you, if you want to get the average value, you can use np.cumsum to get the cumulative sum of your elements and split the main array with np.true_divide :

 >>> np.true_divide(np.cumsum(arr),arr) array([ 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5]) 
+4
source
 import numpy as np a = np.array([1,2,4,6,7,8]) 

Use numpy.meshgrid (there are other formulations that work) and numpy.triu to create an array with values โ€‹โ€‹that interest you.

 x, y = np.meshgrid(a,a) # y = a.repeat(len(a)).reshape(len(a), len(a)) c = np.triu(y) >>> y array([[1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [4, 4, 4, 4, 4, 4], [6, 6, 6, 6, 6, 6], [7, 7, 7, 7, 7, 7], [8, 8, 8, 8, 8, 8]]) >>> c array([[1, 1, 1, 1, 1, 1], [0, 2, 2, 2, 2, 2], [0, 0, 4, 4, 4, 4], [0, 0, 0, 6, 6, 6], [0, 0, 0, 0, 7, 7], [0, 0, 0, 0, 0, 8]]) 

Define a function that returns the median of all nonzero values โ€‹โ€‹and applies it to your interesting array.

 def foo(a): '''return the the median of the non-zero elements of a 1d array ''' return np.median(a[a.nonzero()]) d = np.apply_along_axis(foo, 0, c) >>> d array([ 1. , 1.5, 2. , 3. , 4. , 5. ]) >>> 
0
source

All Articles