Deleting data in ranges

Question

Deleting data in ranges

I have two dimensional data stored in a sorted list of tuples, as follows:

data = [(0.1,100), (0.13,300), (0.2,10)...

The first value in each tuple, the X value, occurs only once for a list of tuples. In other words, there can only be one value for 0.1, etc.

Then I have a sorted list of buckets. A bucket is defined as a tuple containing a range and identifier, as follows:

 buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)...

The range refers to the X axis. Thus, id 2 has two buckets above, and identifiers 1 and 3 have only one, respectively. The first bucket for id 2 has a range from 0 to 0.14. Please note that buckets may overlap.

So, I need an algorithm that transfers data to buckets, and then adds estimates. For the data above, the result will be:

 1:0 2:410 3:10

Note that each piece of data is captured by the bucket associated with ID 2, so it gets a score of 100+300+10=410 .

How can I write an algorithm for this?

+4

python

Baz Dec 02 '12 at 0:22

source share

3 answers

Yunzhi ma · Answer 1 · 2012-12-02T00:52:15+0000

try this code:

 data = [(0.1,100), (0.13,300), (0.2,10)] buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)] def foo(tpl): ## determine the buckets a data-tuple is enclosed by list of IDs x, s = tpl lst = [] for bucket in buckets: rnge, iid = bucket if x>rnge[0] and x<rnge[1]: lst.append(iid) return lst data = [[dt, foo(dt)] for dt in data] scores_dict = {} for tpl in data: score = tpl[0][1] for iid in tpl[1]: if iid in scores_dict: scores_dict[iid]+=score else: scores_dict[iid] =score for key in scores_dict: print key,":",scores_dict[key]

This snippet results in:

 2 : 410 3 : 10

If any bucket identifier is not printed, this bucket does not have an X value, or it is zero.

Rob Cowie · Answer 2 · 2012-12-02T01:13:39+0000

Turn each bucket definition (label range) into a callable, which - given the data set - will increase the total bucket. Bucket values are stored in a simple dict. You can easily wrap this concept in a class if you want to provide a simpler api.

 def partition(buckets, bucket_definition): """Build a callable that increments the appropriate buckets with a value""" lower, upper = bucket_definition[0] key = bucket_definition[1] def _partition(data): x, y = data # Set a default value for this key buckets.setdefault(key, 0) if lower <= x <= upper: buckets[key] += y return _partition bucket_definitions = [ ((0, 0.14), 2), ((0.135, 0.19), 1), ((0.19, 0.21), 2), ((0.19, 0.24), 3) ] data = [(0.1, 100), (0.13, 300), (0.2, 10)] # Holder for bucket labels and values buckets = {} # For each bucket definition (range, label) build a callable partitioners = [partition(buckets, definition) for definition in bucket_definitions] # Map each callable to each data tuple provided for partitioner in partitioners: map(partitioner, data) print(buckets)

khagler · Answer 3 · 2012-12-02T01:27:29+0000

This gives the desired result from your test data:

 data = [(0.1,100), (0.13,300), (0.2,10)] buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)] totals = dict() for bucket in buckets: bucket_id = bucket[1] if bucket_id not in totals: totals[bucket_id] = 0 for data_point in data: if data_point[0] >= bucket[0][0] and data_point[0] <= bucket[0][1]: totals[bucket_id] += data_point[1] for key in sorted(totals): print("{}: {}".format(key, totals[key]))

Deleting data in ranges

More articles: