Data structure for a set of (disjoint) sets

Question

Data structure for a set of (disjoint) sets

I am looking for a data structure that roughly matches (in Java terms) Map<Set<int>, double> . Essentially a set of sets of labeled marbles, where each set of marbles is associated with a scalar. I want it to be able to efficiently handle the following operations:

Add the given integer to each set.
Remove each set containing (or not containing) the given integer, or at least set the associated double to 0.
Connect the two cards by adding deuces for the sets that appear in both.
Multiply all doubles by the given double.
Rarely, iterate over the entire map.

under the following conditions:

Integers fall into a limited range (from 1 to 10,000 or so); the exact range will be known at compile time.
Most integers in the range (80-90%) will never be used, but which ones will not be easily determined until the end of the calculation.
- The number of integers used will almost always be greater than 100.
Many of the sets will be very similar, differing only in a few elements.
It is possible to identify certain groups of integers that often appear only in sequential order: for example, if a set contains integers 27 and 29, then it (almost?) Certainly contains 28.
- It may be possible to identify these groups before starting the calculation.
- These groups usually have about 100 integers.

I reviewed the attempts, but I don’t see a good way to handle the operation "delete each set containing a given integer."

The purpose of this data structure would be to represent discrete random variables and resolve the addition, multiplication, and scalar multiplication operations on them. Each of these discrete random variables would ultimately be created by applying these operations to a fixed (at compile time) set of independent Bernoulli random variables (i.e., with a certain probability takes the value 1 or 0).

Models simulated are close to being representable as Markov chains that are inhomogeneous in time (which, of course, would simplify this immensely), but, unfortunately, it is necessary to track the duration from various transitions.

+7

algorithm data-structures

Alex godofsky Apr 2 '14 at 23:36

source share

1 answer

aa333 · Accepted Answer · 2014-06-21T03:50:38+0000

Here is a data structure that can efficiently perform all your operations:

I am going to name it as BitmapArray for this explanation.

Thinking about this, obviously, only for the described operations, a sorted array with bitmaps in the form of keys and weights (your doubling), since the values will be quite effective.

Bitmaps are what support membership in your set. Since you said that the range of integers in a set is 1-10,000, we can maintain information about any set with a raster picture 10,000 in length.

It will be difficult to sort the array, where the keys can reach 2 ^ 10000, but you can be reasonable in implementing the comparison function as follows:

Iterate from left to right on two bitmaps
XOR bits for each index
Say you get 1 at the i-th position.
Whichever bitmap was 1 in the i-th position more
If you never get 1, they are equal.

I know this is still a slow comparison. But not too slow. Here is the reference fiddle that I made on bitmaps with a length of 10000. This is in Javascript, if you are going to write in Java, it will work even better.

  function runTest() { var num = document.getElementById("txtValue").value; num = isNaN(num * 1) ? 0 : num * 1; /*For integers in the range 1-10,000 the worst case for comparison are any equal integers which will cause the comparision to iterate over the whole BitArray*/ bitmap1 = convertToBitmap(10000, num); bitmap2 = convertToBitmap(10000, num); before = new Date().getMilliseconds(); var result = firstIsGreater(bitmap1, bitmap2, 10000); after = new Date().getMilliseconds(); alert(result + " in time: " + (after-before) + " ms"); } function convertToBitmap(size, number) { var bits = new Array(); var q = number; do { bits.push(q % 2); q = Math.floor(q / 2); } while (q > 0); xbitArray = new Array(); for (var i = 0; i < size; i++) { xbitArray.push(0); } var j = xbitArray.length - 1; for (var i = bits.length - 1; i >= 0; i--) { xbitArray[j] = bits[i]; j-- } return xbitArray; } function firstIsGreater(bitArray1, bitArray2, lengthOfArrays) { for (var i = 0; i < lengthOfArrays; i++) { if (bitArray1[i] ^ bitArray2[i]) { if (bitArray1[i]) return true; else return false; } } return false; } document.getElementById("btnTest").onclick = function (e) { runTest(); };

Also, remember that you only need to do this once, when creating your own BitmapArray (or when combining), and then it will become quite effective for the operations that you most often perform:

Note : N is the length of the BitmapArray.

Add an integer to each set : worst / best time O (N). Flip from 0 to 1 in each bitmap.

Delete each set containing the given integer : the worst case is O (N).

For each bitmap, check the bit that represents the given integer, if 1 mark it with an index.
Compress an array by deleting all marked indices.

If you manage to set the scale to 0, it will be even more efficient. It is also very convenient if you want to delete all sets that have any element in a given set.

Combining two cards : the worst case is O (N1 + N2). Just like merging two sorted arrays, except that you again have to be smart in comparison.

Multiply all doubles by a given double : worst / best time O (N). Iterate and multiply each value by a double input.

Iterate over BitmapArray : worst case / best case O (1) for next element.

Data structure for a set of (disjoint) sets

More articles: