Best data structure for these operations

Question

Best data structure for these operations

I am trying to find a better approach that my current one controls the current state vector of the continuous Markov chain. The state vector used is used to store pairs (state, stability), where the probability is a float.

The part of the algorithm that needs to be optimized performs the following operations:

each iteration begins with the current state vector
calculates the achievable states for each current in the vector and stores them in a temporary list along with the probability of going there
for each element in this new list, it calculates a new state vector by iterating over possible transitions (remember that there can be many transitions that lead to the same state, but from different source states).

this is actually done with hashtables, which have state keys and probability values.

So, basically, to create a new vector, a normalized value is calculated for each transition, then the state in the vector is extracted using get, the probability of the current transition is added, and then the result is saved.

This approach seems to work so far, but I'm trying to deal with systems that lead in very large spatial vectors (500k-1mil states), so although hashtable gives constant complexity for storage and retrieval, iterations start to slow down a lot.

This is because, for example, we start with a vector that has 100k states, for each of which we calculate reachable states (which are usually> 1), so we get a list of transitions from 100 k * (avg reachable states). Then each transition goes through to calculate a new probability vector.

Unfortunately, I need to go through the entire available list to find a normalizing value, without actually calculating the next vecto, but in any case, as I saw through profiling, this is not a bottleneck in the algorithm. A bottleneck is present in calculating each transition.

This is the function used to compute the jump list from the current vector ( pi ):

 HTable.fold (fun spl -> if check s f2 then (0., s, p, [s, 1.0]) :: l else if not (check s f1) then (0., s, p, [s, 1.0]) :: l else let ts = P.rnext s in if List.length ts = 0 then (0., s, p, [s, 1.0]) :: l else let lm = List.fold_left (fun a (s,f) -> f +. a) 0. ts in (lm, s, p, ts) :: l) pi []

And this is a function that calculates the new pi by going through the list of transitions and normalizing them all:

 let update_pi sv = try let t = HTable.find pi s in HTable.replace pi s (v +. t) with Not_found -> HTable.add pi sv in HTable.clear pi; List.iter (fun (llm, s, p, ts) -> if llm = 0. then update_pi sp else begin List.iter (fun (ss, pp) -> update_pi ss (p *. (pp /. lm)) ) ts; if llm < lm then update_pi s (p *. (1. -. (llm /. lm))) end ) u;

I have to find data structures that are better suited for the operations that I do, the problem is that with the existing approach, I have to do get and set for each separate transition, but at the same time many operations on hashtables kill performance since constant costs are quite expensive .

+4

optimization data-structures ocaml

Jack Nov 04 '10 at 12:21

source share

1 answer

Pascal cuoq · Accepted Answer · 2010-11-04T15:10:40+0000

It is not possible to damage linear time if List.length ts = 0 constant time if ts = [] , although I doubt it will solve your performance problem.

Your algorithm sounds a bit like multiplying a matrix by a vector to get a new vector. This usually accelerates the lock . But here hash tables may cost you some space. Is it possible to specify all states once and for all, and then use arrays instead of hash tables? Note that with arbitrary state transitions, destination states will still not be local, but it can be an improvement (if only because access to the array is more direct than access to the hash table).

Best data structure for these operations

More articles: