After my last, failed, try to ask a question here. This time I ask a more precise question:
What I have:
- Huge dataset (finite, but I spent days of multicore processing time to calculate it to ...)
ISet<Point> . - List of input values ββfrom 0 to 2 n n & le; 17
What I need:
3) The table from [1], [2], where I compare each value [2] with the value [1]
Treatment:
For this calculation, I have a formula that takes a bit value (from [2]) and a set of positions (from [1]) and creates a new ISet<Point> . I need to find out which of the source set is equal to the result set (ie, the βCellβ in the table in βA7β can point to βBβ).
Naive way:
Compute the new ISet<Point> and use .Contains(mySet) or something similar in the list of values ββfrom [1]. I did this in previous versions of this proof of concept / pet project, and it was slow when I started to feed huge numbers. Yes, I used a profiler. No, this was not the only slow part of the system, but I spent a lot of time on this naive search / matching.
The question finally:
Since I basically just need to reassign the input, I thought about creating a list of hashed values ββfor the ISet<Point> list, doing the same for my processing result and therefore avoiding comparing whole sets.
Is that a good idea? Would you call this a premature optimization (I know that the naive path above is too slow, but first I have to do something less smart? Performance is really important here, think that work time again)? Any other suggestions for relieving bredons here or ideas that I should read?
Update: Sorry for not receiving a more detailed explanation or sample right away.
Sample for [1] (Note: these are real possible data points, but obviously they are limited):
new List<ISet<Point>>() { new HashSet() {new Point(0,0) }, new HashSet() {new Point(0,0), new Point(2,1) }, new HashSet() {new Point(0,1), new Point(3,1) } }
[2] is just a Boolean vector of length n. For n = 2 this
I can do this with int or long, basically.
Now I have a function that takes a vector and an ISet<Point> and returns a new ISet<Point> . This is not a 1: 1 conversion: a set of 5 can lead to a set of 11 or something else. However, the resulting ISet<Point> is part of the input.
Using letters to set dots and numbers for bit vectors, I start with this
ABCDEF
one
2
3
4
5
6
7
In the end i need
ABCDEF
1 - CAE - -
2 BCEFA -
3 ................
4 ................
5 ................
6 FCBA - -
7 E - CA - D
There are several expensive operations in this project, one is the preparation of point sets ([1]). But now this question is about correspondence: I can easily (more or less, not so important now) calculate the target ISet for a given bit vector and source ISet. Now I need to combine / find this in the source set.
The whole beast will be a finite state machine, where many points are a real state. Later, I am not interested in individual states, I can actually refer to them for anything (letter, index, whatever). I just need to keep the associations:
1, B => C
Update: Eric asked if a HashSet is possible. The answer is yes, but only if the dataset remains small enough. My question is (hashing): can it be / is it advisable to use a hashing algorithm for this hash? My idea is this:
Go ISet<Point> (lazily generated) list / sequence of ISet<Point> (I could change this type, I just want to emphasize that this is a mathematical set of points, without duplicates).
- Create a simpler representation of the input (hash?) And save it (in hashset?)
- Compute all target sets for this input, but only keep a simple representation (see above)
- Cancel set
Fix display (equal hash = equal state)
A good idea? Problems with that? One of them that I could come up with is a collision (how much is this possible?) - and I would not know a good hash function to start with.