How to print a tree structure in a row quickly in Ocaml?

Suppose I have a mathematical expression in the form of a tree in OCaml. It is represented as an algebraic type like this:

type expr = Number of int |Plus of expr*expr 

Well, this is a very simplified definition, but enough to describe the problem.

I want to convert it to a string in reverse polarity format so that Plus (Number i, Number j) becomes (+ ij) . Direct implementation will be

 let rec convert = function Number i -> string_of_int i |Plus (a,b) -> (let s = convert a in let p = convert b in "(+"^s^" "^p^")") 

But the fact is that it is incredibly slow at some entrance (which has a large depth of tree). For example, this input works on my machine for 5 seconds:

 let rec make_pain so_far = function 0 -> so_far |i -> make_pain (Plus (Number 1,so_far)) (i-1) let pain = make_pain (Number 1) 20000 let converted = convert pain 

It seems that string concatenation x^y , where y is the long string, is a performance issue. Indeed, if you replace the expression "(+"^s^" "^p^")" with a simple s^p , it will become much faster.

Using printf instead of concatenation does not make it faster. Converting to C may help, but is there no OCaml solution?

+6
performance string ocaml
source share
1 answer

Use the string Buffer .

^ defined as

 let (^) s1 s2 = let l1 = string_length s1 and l2 = string_length s2 in let s = string_create (l1 + l2) in string_blit s1 0 s 0 l1; string_blit s2 0 s l1 l2; s 

What you do is create a new line every time and copy the old values. Quite a lot of standard in any language, where strings are represented as arrays of characters. The break occurs because you do it four times for each node (for several calls ^ no optimization)! As for the buffer, it will create a giant string and will constantly fill it as managed by the data structure,

  type t = {mutable buffer : string; mutable position : int; mutable length : int; initial_buffer : string} 

Even if you decide to create the initial buffer size to 1 , the resize function will adjust the size so as to limit the number of redistributions. For example, the add_string function add_string increase the size of the array by len*2^(n+p-len) , where n is the length of the new line, p is the position, and len is the length of the original buffer - Of course, if the buffer does not support the string, of course. Thus, the size of the buffer grows exponentially, and as you move through them there will be a little redistribution. Of course, it is best to set the start buffer to something reasonable, but this is not necessary.

The new convert function would not have looked much more complicated:

 let rec convert buf ex = let addb = Buffer.add_string buf in match ex with Number i -> addb (string_of_int i) |Plus (a,b) -> (addb "(+ "; convert buf a; addb " "; convert buf b; addb ")") 
+9
source share

All Articles