Python running on multiple datasets in a maximized window

Question

Python running on multiple datasets in a maximized window

consider the following code:

class MyClass(object): def __init__(self): self.data_a = np.array(range(100)) self.data_b = np.array(range(100,200)) self.data_c = np.array(range(200,300)) def _method_i_do_not_have_access_to(self, data, window, func): output = np.empty(np.size(data)) for i in xrange(0, len(data)-window+1): output[i] = func(data[i:i+window]) output[-window+1:] = np.nan return output def apply_a(self): a = self.data_a def _my_func(val): return sum(val) return self._method_i_do_not_have_access_to(a, 5, _my_func) my_class = MyClass() print my_class.apply_a()

The _method_i_do_not_have_access_to method accepts a numpy array, a window parameter, and a user-defined function descriptor and returns an array containing the values displayed by the function descriptor at the window data points during the input data array — a common rolling method. I do not have access to modify this method.

As you can see, _method_i_do_not_have_access_to passes one input to the function descriptor, which is the data array passed to _method_i_do_not_have_access_to . This function descriptor only calculates output based on window data points on a single data array passed to it through _method_i_do_not_have_access_to .

I need to make _my_func (function descriptor passed to _method_i_do_not_have_access_to ) work with data_b and data_c in addition to the array that is passed to _my_func via _method_i_do_not_have_access_to in the same window indexes . data_b and data_c defined globally in the MyClass class .

The only thing I thought about this is the links to data_b and data_c inside _my_func as follows:

 def _my_func(val): b = self.data_b c = self.data_c # do some calculations return sum(val)

However, I need to trim b and c with the same indices as val (remember that val is the length segment of the window array that passes through _method_i_do_not_have_access_to ).

For example, if the loop in _method_i_do_not_have_access_to currently works with indices 45 -> 50 on the input array, _my_func should work on the same indices on b and c .

The end result will be something like this:

 def _my_func(val): b = self.data_b # somehow identify which slide we are at c = self.data_c # somehow identify which slide we are at # if _method_i_do_not_have_access_to is currently # operating on indexes 45->50, then the sum of # val, b, and c should be the sum of the values at # index 45->50 at each return sum(val) * sum(b) + sum(c)

Any thoughts on how I can do this?

+4

python arrays numpy

Jason strimpel Aug 7 '11 at 16:25

source share

4 answers

you can pass an array of two dimensions to _method_i_do_not_have_access_to (). len () and the slice operation will work with it:

 In [29]: a = np.arange(100) In [30]: b = np.arange(100,200) In [31]: c = np.arange(200,300) In [32]: data = np.c_[a,b,c] # make your three one dimension array to one two dimension array. In [35]: data[0:10] # slice operation works. Out[35]: array([[ 0, 100, 200], [ 1, 101, 201], [ 2, 102, 202], [ 3, 103, 203], [ 4, 104, 204], [ 5, 105, 205], [ 6, 106, 206], [ 7, 107, 207], [ 8, 108, 208], [ 9, 109, 209]]) In [36]: len(data) # len() works. Out[36]: 100 In [37]: data.shape Out[37]: (100, 3)

so that you can define your _my_func as follows:

 def _my_func(val): s = np.sum(val, axis=0) return s[0]*s[1] + s[2]

+1

Hyry Aug 7 '11 at 21:19

source share

Here is the hack:

Create a new DataProxy class that has the __getitem__ method and __getitem__ three data arrays (which you can pass to it, for example, during initialization). Create instances of func act on DataProxy instead of the standard numpy arrays and pass the modified func and proxy to an inaccessible method.

It makes sense? The idea is that there are no restrictions on data as an array, just to follow. This way you can create your own subtype class to use instead of an array.

Example:

 class DataProxy: def __init__(self, *data): self.data = list(zip(*data)) def __getitem__(self, item): return self.data[item]

Then create a new DataProxy, passing as many arrays as you want when you do this, and make func accept the indexing results of the specified instance. Give it a try!

0

katrielalex Aug 7 '11 at 16:37

source share

Since it seems that _method_i_do_not.. just applies your function to your data, can you have the data as an array of indices? Then func will use indexes for window access to data_a , data_b and data_c . There are faster ways, but I think this will work with minimal complexity.

In other words, something like this, with the addition of additional processing on the window if necessary:

 def apply_a(self): a = self.data_a b = self.data_b c = self.data_c def _my_func(window): return sum(a[window]) * sum(b[window]) + sum(c[window]) return self._method_i_do_not_have_access_to(window_indices, 5, _my_func)

0

senderle Aug 7 '11 at 17:27

source share

Voo · Accepted Answer · 2011-08-07T16:58:57+0000

The question is, how will _my_func know which index to work on? If you know the indices beforehand when calling your function, the simplest approach would be to simply use a lambda: lambda val: self._my_func(self.a, self.b, index, val) with _my_func, obviously changed to accommodate additional parameters.

Since you do not know the indexes, you will have to write a wrapper around self.c that remembers which index was last accessed (or even better by the slice operator), and stores this in a variable for your function. use..

Edit: I hit a small example, not a particularly great coding style, and that’s all, but it should give you an idea:

 class Foo(): def __init__(self, data1, data2): self.data1 = data1 self.data2 = data2 self.key = 0 def getData(self): return Foo.Wrapper(self, self.data2) def getKey(self): return self.key class Wrapper(): def __init__(self, outer, data): self.outer = outer self.data = data def __getitem__(self, key): self.outer.key = key return self.data[key] if __name__ == '__main__': data1 = [10, 20, 30, 40] data2 = [100, 200, 300, 400] foo = Foo(data1, data2) wrapped_data2 = foo.getData() print(wrapped_data2[2:4]) print(data1[foo.getKey()])

Python running on multiple datasets in a maximized window

More articles: