The most effective way to find neighbors in a list

I have a list of length 2016, but only 242 contain data, the rest is None. My goal is to interpolate between the values ​​in order to fill all the spaces with a simple IDW form (weight backward distance). So the task of my script is this:

  • Iterate over all myList elements
  • If myList contains a value ( not No), just copy it
  • If you find "None" in myList, get the position / value of the left and right neighbors by calculating the distance to all elements in myList
  • Calculate the interpolated value for the gap between both neighbors (the further they are, the less they will get)

Suppose we have a smaller list of 14 elements (5 valid):

myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79] resultList = [None] * len(myList) for i in range(len(myList): if not myList[i] is None: resultList[i] = myList[i] else: distance = [i - j for j in range(len(myList)) if not myList[j] is None] neighbors = min([n for n in dist if n>0]), max([n for n in dist if n<0]) # rest of the interpolation (not important for my question): neighbors_c = [(1/float(n))**2 for n in neighbors] c_sum = sum(neighbors_c) neighbors_c = [n/c_sum for n in neighbors_c] resultList = myList[i-neighbors[0]]*neighbors_c[0] + myList[i-neighbors[1]]*neighbors_c[1] 

I do this for many many datasets. I found out that this method takes about 0.59 seconds per data set. My concern is the fact that my list is sorted, but I only need 2 values. Thus, 99% of the distances are calculated for nothing. This led me to two attempts: stop the iteration after ij becomes negative, because then, obviously, it ran into the closest values:

So, instead of understanding the list:

 distance = [i - j for j in range(len(myList)) if not myList[j] is None] 

I am doing the right loop, which I leave after the distances have passed zero and thus become larger again:

 dist = [] for j in range(len(myList)): if not myList[j] is None: dist.append(ij) if ij < 0: break 

Using this method, I was able to go down to 0.38 seconds for each data set. When all the elements in myList are repeated, this second method is executed from the very beginning (the element falls after a cycle 2, 3, 4, ... and stops immediately), but there is no improvement for the last elements, since iteration always starts at j = 0.

I wonder if you can come up with any faster way to find two neighbors of a certain number in the data set, without having to check ALL distances and accept only the largest negative and small positive ones.

Also, I'm pretty new to python, so please let me know if you find other non-pythonic expressions in my script. Thank you guys very much!

+7
python list
source share
1 answer

UPDATE: Here's how to do it with numpy interp :

 import numpy as np myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79] values = [(i, val) for i, val in enumerate(myList) if val is not None] xp, fp = zip(*values) print(xp) # (0, 4, 7, 9, 13) print(fp) # (26, 31, 58, 42, 79) result = np.interp(np.arange(len(myList)), xp, fp) print(result) # [ 26. 27.25 28.5 29.75 31. 40. 49. 58. 50. 42. 51.25 60.5 69.75 79. ] 

Original post:

As others have said, it is best to use some interpolation already implemented in numpy or pandas.

However, for completeness, here was a quick solution that I came up with:

 myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79] resultList = [] # first lets split the list into sublists that group the numbers # and the Nones into groups for i, item in enumerate(myList): if i == 0: resultList.append([item]) else: if type(resultList[-1][-1]) == type(item): resultList[-1].append(item) else: resultList.append([item]) print(resultList) # [[26], [None, None, None], [31], [None, None], [58], [None], [42], [None, None, None], [79]] # now lets interpolate the sublists that contain Nones for i, item in enumerate(resultList): if item[0] is not None: continue # this is a bit problematic, what do we do if we have a None at the beginning or at the end? if i == 0 or i + 1 == len(resultList): continue prev_item = resultList[i - 1][-1] next_item = resultList[i + 1][0] difference = next_item - prev_item item_length = len(item) + 1 for j, none_item in enumerate(item): item[j] = prev_item + float(j + 1) / item_length * difference # flatten the list back resultList = [item for sublist in resultList for item in sublist] print(resultList) # [26, 27.25, 28.5, 29.75, 31, 40.0, 49.0, 58, 50.0, 42, 51.25, 60.5, 69.75, 79] 

I suggest you use this only for training or for simple cases, since it does not handle cases when you have lists starting or ending with None

+2
source share

All Articles