LeetCode Problem Workspace

Find Duplicate Subtrees

Identify all duplicate subtrees in a binary tree by efficiently tracking structure and values with depth-first traversal.

category

4

Topics

code_blocks

6

Code langs

hub

3

Related

Practice Focus

Medium · Binary-tree traversal and state tracking

bolt

Answer-first summary

Identify all duplicate subtrees in a binary tree by efficiently tracking structure and values with depth-first traversal.

Interview AiBox logo

Ace coding interviews with Interview AiBox guidance for Binary-tree traversal and state tracking

Try AiBox Copilotarrow_forward

Start by serializing each subtree into a unique string representation while traversing the tree with DFS. Use a hash table to count occurrences of each serialization, capturing duplicates when the same structure appears multiple times. Return the root nodes of all detected duplicate subtrees, ensuring each duplicate is only reported once.

Problem Statement

Given the root of a binary tree, return all subtrees that appear more than once. Each duplicate subtree should be represented by its root node, and two subtrees are considered identical if they share the same structure and node values.

Implement an efficient approach to detect duplicate subtrees by traversing the binary tree and tracking subtree states. Focus on minimizing redundant computations while handling up to 5000 nodes and node values in the range [-200, 200].

Examples

Example 1

Input: root = [1,2,3,4,null,2,4,null,null,4]

Output: [[2,4],[4]]

Example details omitted.

Example 2

Input: root = [2,1,1]

Output: [[1]]

Example details omitted.

Example 3

Input: root = [2,2,2,3,null,3,null]

Output: [[2,3],[3]]

Example details omitted.

Constraints

  • The number of the nodes in the tree will be in the range [1, 5000]
  • -200 <= Node.val <= 200

Solution Approach

DFS with Subtree Serialization

Traverse the binary tree in post-order and serialize each subtree into a string format including node values and child structures. Store the serialization in a hash map with its frequency count. If a serialization appears more than once, record its root node as a duplicate.

Hash Table Frequency Tracking

Use a hash table to map serialized subtrees to counts. This allows constant-time checking for duplicates and ensures that each duplicate subtree is only added once to the result list, preventing repeated entries of the same subtree.

Optimized Memory Management

Instead of storing full string serializations, consider assigning unique IDs to repeated subtrees and storing only IDs in the hash table. This reduces memory usage, especially for large trees with multiple repeated structures.

Complexity Analysis

Metric Value
Time Depends on the final approach
Space Depends on the final approach

Time complexity is O(N), where N is the number of nodes, since each node is visited once and subtree serialization is proportional to its size. Space complexity is O(N) for the hash map storing subtree serializations and the recursion stack in DFS.

What Interviewers Usually Probe

  • Ask candidates how to represent subtrees uniquely to detect duplicates efficiently.
  • Probe understanding of DFS traversal order and why post-order is suitable for subtree serialization.
  • Check if the candidate considers memory optimization when multiple identical subtrees exist.

Common Pitfalls or Variants

Common pitfalls

  • Serializing subtrees incorrectly by ignoring null nodes, which can cause false positives.
  • Adding duplicate roots multiple times instead of tracking counts properly.
  • Using inefficient traversal orders or repeated recomputation of subtree serializations.

Follow-up variants

  • Find duplicate subtrees but return all occurrences instead of only one root per duplicate type.
  • Detect duplicate subtrees in an n-ary tree using a similar DFS and hash map approach.
  • Return the count of duplicate subtree types rather than the root nodes.

FAQ

What is the recommended traversal for detecting duplicate subtrees?

Post-order DFS is recommended because it ensures child subtrees are serialized before their parent, enabling accurate duplicate detection.

How does the hash table prevent reporting the same duplicate multiple times?

By storing a count of each serialized subtree, the solution adds a root node to the result list only when the count reaches exactly two.

Can this method handle large trees efficiently?

Yes, using unique subtree IDs or serialization with a hash table keeps both time and space complexity manageable even for thousands of nodes.

Are null children important in serialization?

Yes, including null markers is essential to distinguish structurally different subtrees that may have the same node values.

Does GhostInterview focus on Binary-tree traversal and state tracking?

Absolutely, GhostInterview emphasizes serialization, hash-based frequency tracking, and DFS traversal to detect duplicate subtrees accurately.

terminal

Solution

Solution 1

#### Python3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right
class Solution:
    def findDuplicateSubtrees(
        self, root: Optional[TreeNode]
    ) -> List[Optional[TreeNode]]:
        def dfs(root):
            if root is None:
                return '#'
            v = f'{root.val},{dfs(root.left)},{dfs(root.right)}'
            counter[v] += 1
            if counter[v] == 2:
                ans.append(root)
            return v

        ans = []
        counter = Counter()
        dfs(root)
        return ans
Find Duplicate Subtrees Solution: Binary-tree traversal and state track… | LeetCode #652 Medium