Time based operations based on bills in which the state of the preceding elements takes place - do the loops correspond?

What numpy arrays provide when doing time-based calculations in which state matters. In other words, what happens sooner or later in sequence is important.

Consider the following time-based vectors,

TIME = np.array([0., 10., 20., 30., 40., 50., 60., 70., 80., 90.]) FLOW = np.array([100., 75., 60., 20.0, 60.0, 50.0, 20.0, 30.0, 20.0, 10.0]) TEMP = np.array([300., 310., 305., 300., 310., 305., 310., 305., 300., 295.0]) 

Let's say that the exponential attenuation in TEMP should be applied after the FLOW drops below 30, without rising again above 50. In the above data, the function will be applied at TIME = 60 above, and the last two TEMP values ​​will be updated by this secondary function that started with corresponding TEMP value.

There is a need to “look ahead” to determine if FLOW exceeds 50 in elements after requesting conditions <30. It does not seem like numpy functions target time-based vectors where state is important, and the traditional nested loop method may remains what you need. But, given that my novelty is numpy and the fact that I have to perform many of these types of state-based manipulations, I would appreciate guidance or confirmation.

+4
source share
2 answers

Although Joe Kington's answer is certainly correct (and quite flexible), it is more helpful than necessary. For someone trying to learn Numpy, I think a more direct way might be easier to understand.

As I noted on your question (and as Joe remarked), there seems to be an inconsistency between your description of the code's behavior and your example. Like Joe, I'm also going to suggest that you are describing the right behavior.

A few notes:

  • Numpy works well with filters to indicate which elements to apply the operation to. I use them several times.
  • The np.flatnonzero function returns an array of indices defining locations at which the given array is non-zero (or True).

The code uses the example arrays that you provided.

 import numpy as np TIME = np.array([0., 10., 20., 30., 40., 50., 60., 70., 80., 90.]) FLOW = np.array([100., 75., 60., 20.0, 60.0, 50.0, 20.0, 30.0, 20.0, 10.0]) TEMP = np.array([300., 310., 305., 300., 310., 305., 310., 305., 300., 295.0]) last_high_flow_index = np.flatnonzero(FLOW > 50)[-1] low_flow_indices = np.flatnonzero(FLOW < 30) acceptable_low_flow_indices = low_flow_indices[low_flow_indices > last_high_flow_index] apply_after_index = acceptable_low_flow_indices[0] 

Now we have an index, after which the function should be applied to TEMP. If I read your question correctly, you would like the temperature to begin to decompose after your condition is satisfied. This can be done as follows:

 time_delta = TIME[apply_after_index:] - TIME[apply_after_index] TEMP[apply_after_index:] = TEMP[apply_after_index:] * np.exp(-0.05 * time_delta) 

TEMP updated, so print TEMP prints

 [ 300. 310. 305. 300. 310. 305. 310. 184.99185121 110.36383235 65.82339724] 

Alternatively, you can apply an arbitrary Python function to the corresponding elements, first vectorize the function:

 def myfunc(x): ''' a normal python function that acts on individual numbers''' return x + 3 myfunc_v = np.vectorize(myfunc) 

and then update the TEMP array:

 TEMP[apply_after:] = myfunc_v(TEMP[apply_after:]) 
+4
source

You can do this without nested loops in numpy. If you want to get really fantasy, you can probably vectorize the whole thing, but it is probably the most readable, just to vectorize it to the extent that you only need one loop.

Generally speaking, try vectoring things if it doesn't become overly clumsy / unreadable or you have memory problems. Then do it differently.

In some cases, loops are more readable, and they usually use less memory than vectorized expressions, but they are usually slower than vectorized expressions.

You will probably be surprised how flexible the various indexing tricks are. It’s rare that you need to use loops to calculate, but they often become more readable in complex cases.

However, I am a little confused by what you are claiming as the right case ... You say you want to apply the function to parts of the tempo where the stream drops below 30 without rising above 50. By this logic, the function should be applied to the last 4 elements of the temp array . However, you argue that this should only apply to the last two ... I'm confused! I am going to go with my reading of things and apply it to the last 4 elements of an array ...

This is how I do it. This uses random data, not your data, so there are several regions ...

Please note that there are no nested loops, and we only repeat the number of contiguous areas in the array where your “asymmetric” threshold conditions are satisfied (that is, in this case there is only one iteration).

 import numpy as np import matplotlib.pyplot as plt def main(): num = 500 flow = np.cumsum(np.random.random(num) - 0.5) temp = np.cumsum(np.random.random(num) - 0.5) temp -= temp.min() - 10 time = np.linspace(0, 10, num) low, high = -1, 1 # For regions of "flow" where flow drops below low and thereafter # stays below high... for start, stop in asymmetric_threshold_regions(flow, low, high): # Apply an exponential decay function to temp... t = time[start:stop+1] - time[start] temp[start:stop+1] = temp[start] * np.exp(-0.5 * t) plot(flow, temp, time, low, high) def contiguous_regions(condition): """Finds contiguous True regions of the boolean array "condition". Returns a 2D array where the first column is the start index of the region and the second column is the end index.""" # Find the indicies of changes in "condition" d = np.diff(condition) idx, = d.nonzero() if condition[0]: # If the start of condition is True prepend a 0 idx = np.r_[0, idx] if condition[-1]: # If the end of condition is True, append the length of the array idx = np.r_[idx, len(condition)-1] # Reshape the result into two columns idx.shape = (-1,2) return idx def asymmetric_threshold_regions(x, low, high): """Returns an iterator over regions where "x" drops below "low" and doesn't rise above "high".""" # Start with contiguous regions over the high threshold... for region in contiguous_regions(x < high): start, stop = region # Find where "x" drops below low within these below_start, = np.nonzero(x[start:stop] < low) # If it does, start at where "x" drops below "low" instead of where # it drops below "high" if below_start.size > 0: start += below_start[0] yield start, stop def plot(flow, temp, time, low, high): fig = plt.figure() ax1 = fig.add_subplot(2,1,1) ax1.plot(time, flow, 'g-') ax1.set_ylabel('Flow') ax1.axhline(y=low, color='b') ax1.axhline(y=high, color='g') ax1.text(time.min()+1, low, 'Low Threshold', va='top') ax1.text(time.min()+1, high, 'High Threshold', va='bottom') ax2 = fig.add_subplot(2,1,2, sharex=ax1) ax2.plot(time, temp, 'b-') ax2.set_ylabel('Temp') ax2.set_xlabel('Time (s)') for start, stop in asymmetric_threshold_regions(flow, low, high): ax1.axvspan(time[start], time[stop], color='r', alpha=0.5) ax2.axvspan(time[start], time[stop], color='r', alpha=0.5) plt.setp(ax1.get_xticklabels(), visible=False) plt.show() if __name__ == '__main__': main() 

alt text

+3
source

All Articles