Why is my Trie search slower than the standard F # Map?

So, I just ported Trie from OCaml. Unfortunately, it is slower than a standard card in terms of tryFind. I do not understand this - it seems that it should be faster. Are F # code libraries in a special way to make them faster than the code that the user typically deploys?

Here the code is

[<RequireQualifiedAccess>] module Trie type Node<'k, 'v when 'k : comparison> = { TrieMap : Map<'k, Node<'k, 'v>> TrieKvp : ('k list * 'v) option } member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty let inline make map kvp = { TrieMap = map TrieKvp = kvp } let inline makeEmpty () : Node<'k, 'v> = make Map.empty None let inline isEmpty (node : Node<'k, 'v>) = node.IsEmpty let rec tryFind (key : 'k list) node = if key.IsEmpty then match node.TrieKvp with | Some (_, value) -> Some value | None -> None else let keyHead = key.Head let keyTail = key.Tail let optSubNode = Map.tryFind keyHead node.TrieMap match optSubNode with | Some subNode -> tryFind keyTail subNode | None -> None let inline containsKey key node = (tryFind key node).IsSome let rec addInternal (key : 'k list) value node = if key.IsEmpty then make node.TrieMap (Some (key, value)) else let keyHead = key.Head let keyTail = key.Tail let newTrie = match Map.tryFind keyHead node.TrieMap with | Some subTrie -> subTrie | None -> makeEmpty () let newTrie2 = addInternal keyTail value newTrie make (Map.add keyHead newTrie2 node.TrieMap) node.TrieKvp let inline add key value node = addInternal key value node let rec addMany kvps node = if Seq.isEmpty kvps then node else let kvpHead = Seq.head kvps let kvpTail = Seq.skip 1 kvps let newTrie = add (fst kvpHead) (snd kvpHead) node addMany kvpTail newTrie let inline ofList kvps = addMany kvps (makeEmpty ()) let inline ofListBy by kvps = let pairs = List.map by kvps ofList pairs let rec foldInternal folder rev node state = match node.TrieKvp with | Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value | None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap let inline fold folder state node = foldInternal folder [] node state let rec map (mapper : 'k list -> 'v -> 'a) (node : Node<'k, 'v>) : Node<'k, 'a> = match node.TrieKvp with | Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value)) | None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None let inline toValueList node = fold (fun state _ value -> value :: state) [] node let inline singleton (key, value) = add key value (makeEmpty ()) 

Here is the performance test that John Harrop provided me to find improvements that are adequate to measure -

 let xs = Array.init 1000000 (fun i -> [i]) let timer = System.Diagnostics.Stopwatch.StartNew() let mutable t = Trie.makeEmpty() for i=0 to xs.Length-1 do t <- Trie.add xs.[i] xs.[i] t printfn "Trie took %fs to build" timer.Elapsed.TotalSeconds timer.Restart() for _ in 1..100 do for i=0 to xs.Length-1 do ignore(Trie.tryFind xs.[i]) printfn "Trie took %fs to search" timer.Elapsed.TotalSeconds let timer = System.Diagnostics.Stopwatch.StartNew() let mutable t = Map.empty for i=0 to xs.Length-1 do t <- Map.add xs.[i] xs.[i] t printfn "Map took %fs to build" timer.Elapsed.TotalSeconds timer.Restart() for _ in 1..100 do for i=0 to xs.Length-1 do ignore(Map.tryFind xs.[i]) printfn "Map took %fs to search" timer.Elapsed.TotalSeconds 

NOTE. If you have a faster search data structure, note that I need a persistent data structure.

+4
source share
3 answers

Unfortunately, it is slower than a standard card in terms of tryFind. I do not understand this - it seems that it should be faster.

A quick test here says that your trick is already faster than Map for at least a simple case:

 do let n = 0 let xs = Array.init 1000000 (fun i -> [i]) let timer = System.Diagnostics.Stopwatch.StartNew() let mutable t = Trie.makeEmpty() for i=0 to xs.Length-1 do t <- Trie.add xs.[i] xs.[i] t printfn "Trie took %fs to build" timer.Elapsed.TotalSeconds timer.Restart() for _ in 1..100 do for i=0 to xs.Length-1 do ignore(Trie.tryFind xs.[i]) printfn "Trie took %fs to search" timer.Elapsed.TotalSeconds let timer = System.Diagnostics.Stopwatch.StartNew() let mutable t = Map.empty for i=0 to xs.Length-1 do t <- Map.add xs.[i] xs.[i] t printfn "Map took %fs to build" timer.Elapsed.TotalSeconds timer.Restart() for _ in 1..100 do for i=0 to xs.Length-1 do ignore(Map.tryFind xs.[i]) printfn "Map took %fs to search" timer.Elapsed.TotalSeconds 

I get 4s to create your Trie, 8.7s to create a Map and about 0.7 to search in both cases.

However, there are many opportunities to improve your implementation. I recently wrote an article about the optimized universal implementation of hash hashes in F #, which was published here .

Your subsequent comments imply that you want to use this only for string matching. If so, it would be much more efficient to specialize your trie for string keys.

EDIT

KVB suggested that I talk in detail about the "room for improvement", so some reviews:

  • Use inline sparingly as optimization, and only based on compelling performance measurements.
  • Make an empty value, not a function.
  • Avoid List.head and List.tail whenever possible. Use pattern matching instead.
  • Avoid opportunities whenever possible (e.g. hard code for string keys in this case).
+4
source

Well, therefore, after a little thought, I suggested that the real difference in performance is in using lists for keys, not strings. Lines (and array) have much better cache consistency. So, I changed the key from the list "k" to a string and voila! Performance is now really better than the map in my application!

Here the code is

 [<RequireQualifiedAccess>] module StringTrie type Node<'v> = { TrieMap : Map<char, Node<'v>> TrieKvp : (string * 'v) option } member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty let inline make map kvp = { TrieMap = map TrieKvp = kvp } let inline makeEmpty () : Node<'v> = make Map.empty None let inline isEmpty (node : Node<'v>) = node.IsEmpty let rec tryFindInternal (key : string) index node = if key.Length = index then match node.TrieKvp with | Some (_, value) -> Some value | None -> None else let optSubNode = Map.tryFind key.[index] node.TrieMap match optSubNode with | Some subNode -> tryFindInternal key (index + 1) subNode | None -> None let inline tryFind (key : string) node = tryFindInternal key 0 node let inline containsKey key node = (tryFind key node).IsSome let rec addInternal (key : string) index value node = if key.Length = index then make node.TrieMap (Some (key, value)) else let char = key.[index] let newTrie = match Map.tryFind char node.TrieMap with | Some subTrie -> subTrie | None -> makeEmpty () let newTrie2 = addInternal key (index + 1) value newTrie make (Map.add char newTrie2 node.TrieMap) node.TrieKvp let inline add key value node = addInternal key 0 value node let rec addMany kvps node = if Seq.isEmpty kvps then node else let kvpHead = Seq.head kvps let kvpTail = Seq.skip 1 kvps let newTrie = add (fst kvpHead) (snd kvpHead) node addMany kvpTail newTrie let inline ofList kvps = addMany kvps (makeEmpty ()) let inline ofListBy by kvps = let pairs = List.map by kvps ofList pairs let rec foldInternal folder rev node state = match node.TrieKvp with | Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value | None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap let inline fold folder state node = foldInternal folder [] node state let rec map (mapper : string -> 'v -> 'a) (node : Node<'v>) : Node<'a> = match node.TrieKvp with | Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value)) | None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None let inline toValueList node = fold (fun state _ value -> value :: state) [] node let inline singleton (key, value) = add key value (makeEmpty ()) 

I also built a version that works for arrays in general, and also fast -

 [<RequireQualifiedAccess>] module ArrayTrie type Node<'k, 'v when 'k : comparison> = { TrieMap : Map<'k, Node<'k, 'v>> TrieKvp : ('k array * 'v) option } member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty let inline make map kvp = { TrieMap = map TrieKvp = kvp } let inline makeEmpty () : Node<'k, 'v> = make Map.empty None let inline isEmpty (node : Node<'k, 'v>) = node.IsEmpty let rec tryFindInternal (key : 'k array) index node = if key.Length = index then match node.TrieKvp with | Some (_, value) -> Some value | None -> None else let optSubNode = Map.tryFind key.[index] node.TrieMap match optSubNode with | Some subNode -> tryFindInternal key (index + 1) subNode | None -> None let inline tryFind (key : 'k array) node = tryFindInternal key 0 node let inline containsKey key node = (tryFind key node).IsSome let rec addInternal (key : 'k array) index value node = if key.Length = index then make node.TrieMap (Some (key, value)) else let char = key.[index] let newTrie = match Map.tryFind char node.TrieMap with | Some subTrie -> subTrie | None -> makeEmpty () let newTrie2 = addInternal key (index + 1) value newTrie make (Map.add char newTrie2 node.TrieMap) node.TrieKvp let inline add key value node = addInternal key 0 value node let rec addMany kvps node = if Seq.isEmpty kvps then node else let kvpHead = Seq.head kvps let kvpTail = Seq.skip 1 kvps let newTrie = add (fst kvpHead) (snd kvpHead) node addMany kvpTail newTrie let inline ofList kvps = addMany kvps (makeEmpty ()) let inline ofListBy by kvps = let pairs = List.map by kvps ofList pairs let rec foldInternal folder rev node state = match node.TrieKvp with | Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value | None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap let inline fold folder state node = foldInternal folder [] node state let rec map (mapper : 'k array -> 'v -> 'a) (node : Node<'k, 'v>) : Node<'k, 'a> = match node.TrieKvp with | Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value)) | None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None let inline toValueList node = fold (fun state _ value -> value :: state) [] node let inline singleton (key, value) = add key value (makeEmpty ()) 

The only thing that seems to improve performance is to get an internal pointer to a string and, rather, instead of doing indexes over and over again. This does not seem easy in F #, but at least it is possible for arrays in C #.

+4
source

Why not? How about OCaml, is it faster there? Since your Trie implemented in terms of Map , I would expect it to be slower than Map , at least for some inputs. In some cases, it may exceed Map , for example, when the size is very large.

Also, if your primary interest is search performance, why not freeze your Trie to use Dictionary nodes?

+2
source

All Articles