Data Range Filters in C ++

I want the user to be able to determine the ranges that will filter the data. Certain ranges can be contiguous, overlapping, or divided (for example, the user enters the following ranges: 1-10, 5-10, 10-12, 7-13, and 15-20).

Then I want to filter the data so that the user displays only what is inside these ranges.

I probably create code at a different level that will combine ranges appropriately (so the above example will become 1-13 and 15-20, but I don’t want my data service to deal with this, so it should be able to handle the example above)

I have a lot of data, and speed is a priority, so I don’t want iterations through the range list for each data item to check whether it should be displayed to the user or not.

Is there a data structure (or some algorithm) that can be used to achieve this?

+4
source share
8 answers

You can use boost filter_iterator to achieve this.

+3
source

If you sort the list of ranges, you can use binary search to minimize iteration. But in fact, if you do not have a large number of ranges, iteration will be the fastest.

0
source

You can use iterators in your containers. For example, std :: vector provides the "at" method. These iterators can be contiguous, overlapping, or split.

0
source

Make your list disjoint (as you said) by combining ranges that overlap. Then sort the array of endpoints and perform a binary search for each data item and determine if it is within or outside the range. Even elements will always start a range, odd elements always end a range.

NTN.

0
source

The solution generally depends on the range boundaries.

  • If max - min not so large (for example, you define the boundaries in [1..1024]), you can only use an array that points each X to the range list. For your example, the array should be:
  ranges = [0: (1,10), 1: (5,10), 2: (10,12), 3: (7,13), 4: (15-20)]
 points = [1: [0], 2: [0], 3: [0], 4: [0], 5: [0,1], ..., 7: [0,1,3] ,. ..10: [0,1,2,3], ... 15: [4], ... 20: [4], 21: [] ...]

So, in this case, you could quicly define the ranges for a particular X.

  1. You can use the Interval Tree - less efficiently, but frendlier memory (of course, more efficient than brute force solution).
0
source

One approach is to combine the ranges that you get and match them with a base bitmap, indicating or not in the range.

The class-based design will allow you to overload operator += for syntactic sugar, but a bare bitmap will work just as well. For instance:

 # original bitmap bits = [ 0,0,0,0,0,0,0,0,0,0 ] # add 1-5 bits = [ 0,1,1,1,1,1,0,0,0,0 ] # add 4 - 6 bits = [ 0,1,1,1,1,1,1,0,0,0 ] # Look for 3 bits[3] == 1 ? 
0
source

I think what you want to do is called Range Minimum Query .

0
source

It is not too difficult if your data is already sorted. Use combination

For each of your [min, max] subranges, you can find the i_min and i_max iterators and use them as

 std::make_pair(i_min, i_max) 

to make it a "range" compatible. Then use boost :: join to combine all sub ranges into one range (lazily, of course), and then use that range down stream processing.

Obviously, you must pre-process all ranges to make sure that they do not overlap.

0
source

All Articles