How to calculate spatial complexity for finding binary subtria

This issue is from Cracking the Coding Interview. I find it difficult to understand the spatial complexity of the solution.

Problem:
You have two very large binary trees: T1, with millions of nodes, and T2, with hundreds of nodes. Create an algorithm to decide if T2 is a subtree of T1.

Solution (in Java):

public static boolean containsTree(TreeNode t1, TreeNode t2) { if (t2 == null) return true; // The empty tree is a subtree of every tree. else return subTree(t1, t2); } /* Checks if the binary tree rooted at r1 contains the binary tree * rooted at r2 as a subtree somewhere within it. */ public static boolean subTree(TreeNode r1, TreeNode r2) { if (r1 == null) return false; // big tree empty & subtree still not found. if (r1.data == r2.data) { if (matchTree(r1,r2)) return true; } return (subTree(r1.left, r2) || subTree(r1.right, r2)); } /* Checks if the binary tree rooted at r1 contains the * binary tree rooted at r2 as a subtree starting at r1. */ public static boolean matchTree(TreeNode r1, TreeNode r2) { if (r2 == null && r1 == null) return true; // nothing left in the subtree if (r1 == null || r2 == null) return false; // big tree empty & subtree still not found if (r1.data != r2.data) return false; // data doesn't match return (matchTree(r1.left, r2.left) && matchTree(r1.right, r2.right)); } 

The book says that the complexity of the space of this solution is O (log (n) + log (m)) , where m is the number of nodes in T1 (the tree is larger) and n is the number of nodes in T2.

I think that the solution has O (log (m) * log (n)) space complexity , since the subtree function has log (n) recursive calls and each recursive call executes the matchTree function, which starts the recursive calls to log (m).

Why is this solution O (log (n) + log (m)) complexity?

+5
source share
1 answer

Since we are not creating any objects on the heap, the complexity of space is the size of the stack. So the question is not how many total calls are taking place, but how big the stack can grow.

containsTree() can only call subTree() , subTree() can call itself or matchTree() , and matchTree() can only call itself. Therefore, at any point where matchTree() was called, the stack looks like this:

 [containsTree] [subTree] ... [subTree] [matchTree] ... [matchTree] 

That's why you don't multiply spatial complexity here: while each subTree() call can call matchTree() , these matchTree() calls leave the stack before subTree() continues recursing.

Parsing the "correct answer" lines

If the question does not indicate whether the trees are balanced, then the real worst analysis suggests that they may not be so. However, you and the book think they are. We can postpone this question later, saying that depth T1 is equal to c, and depth T2 is equal to d. c - O (log (m)) if T1 is balanced, and O (m) otherwise. The same for T2 d.

The worst case for matchTree() is O (d), because it can be the most distant height T2.

The worst case for subTree() is O (c) for its recursion, because the most recent of them may be the height T1, plus the cost of calling matchTree() , for the whole O (c + d).

And containsTree() just adds a constant on top of the subTree() call, so that doesn't change the complexity of the space.

So, if both T1 and T2 are balanced, replacing c and d, you will see that O (log (m) + log (n)) seems reasonable.

Problems with the "correct answer"

As I said, it’s wrong to assume that binary trees are balanced until you know what they are. Therefore, the best answer may be O (m + n).

But wait! The question indicates that size T2 is smaller than size T1. This means that n is O (m) and log (n) is O (log (m)). So why did we spend time worrying about n?

If the trees are balanced, the complexity of the space is simply O (log (m)). In the general case, when you do not know what is balanced or not, the real answer should be O (m), the size of a larger tree.

+5
source

All Articles