There is no need to make a full copy of Set to insert an element into it. Inside the element is stored in a tree, which means that you only need to create new nodes along the insertion path. Untouched nodes can be shared between the version of Set for pre-installation and after insertion. And as Deitrich Epp pointed out that in a balanced tree O(log(n)) there is an insertion path length. (Sorry for the exclusion of this important fact.)
Let's say your Tree type looks like this:
data Tree a = Node a (Tree a) (Tree a) | Leaf
... and let's say you have a Tree that looks like
let t = Node 10 tl (Node 15 Leaf tr')
... where tl and tr' are some named subtrees. Now say that you want to insert 12 into this tree. Well, it will look something like this:
let t' = Node 10 tl (Node 15 (Node 12 Leaf Leaf) tr')
The subtrees tl and tr' are split between t and t' , and you only needed to build 3 new Nodes , although the size of t can be much larger than 3.
EDIT: Rebalancing
Regarding rebalancing, think of it this way and note that I do not pretend to be strict here. Say you have an empty tree. Already balanced! Now say that you are inserting an element. Already balanced! Now say that you are inserting another element. Well, there is a strange number, so you cannot do much.
Here is the hard part. Let's say you insert another element. This can go in two ways: left or right; balanced or unbalanced. In case of imbalance, you can clearly rotate the tree to balance it. In case it is balanced, already balanced!
It is important to note that you are constantly balancing. This is not like you have a tree mess, decided to insert the element, but before you do this, you will rebalance and then leave the mess after you have completed the insert.
Now say that you continue to insert elements. The tree will be unbalanced, but not much. And when this happens, first you fix it immediately, and secondly, the correction occurs along the insert path, which is O(log(n)) in a balanced tree. The rotations in the paper you linked touch no more than three nodes in the tree to perform the rotation. so you do O(3 * log(n)) work when rebalancing. This is still O(log(n)) .