LeetCode Problem Workspace

Find Median from Data Stream

Implement a MedianFinder class that supports adding numbers and finding the median from a data stream.

category

5

Topics

code_blocks

9

Code langs

hub

3

Related

Practice Focus

Hard · Two-pointer scanning with invariant tracking

bolt

Answer-first summary

Implement a MedianFinder class that supports adding numbers and finding the median from a data stream.

Interview AiBox logo

Ace coding interviews with Interview AiBox guidance for Two-pointer scanning with invariant tracking

Try AiBox Copilotarrow_forward

The Find Median from Data Stream problem requires implementing a MedianFinder class that dynamically calculates the median from a stream of integers. The goal is to design an efficient solution with operations like addNum and findMedian, which should work well within the given constraints.

Problem Statement

In this problem, you are tasked with implementing a class called MedianFinder, which should be able to dynamically calculate the median of numbers as they are added to a data stream. The median is the middle value of a sorted list, or the average of the two middle values if the list has an even number of elements. Your solution should efficiently support operations that add a number to the stream and compute the median in real-time.

You are given two operations: addNum(int num), which adds an integer to the data stream, and findMedian(), which returns the current median of all numbers added so far. The challenge is to design an algorithm that performs these operations efficiently within the constraints of at least one element in the data structure before calling findMedian, and a maximum of 50,000 calls to addNum and findMedian.

Examples

Example 1

Input: See original problem statement.

Output: See original problem statement.

Input ["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"] [[], [1], [2], [], [3], []] Output [null, null, null, 1.5, null, 2.0]

Explanation MedianFinder medianFinder = new MedianFinder(); medianFinder.addNum(1); // arr = [1] medianFinder.addNum(2); // arr = [1, 2] medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2) medianFinder.addNum(3); // arr[1, 2, 3] medianFinder.findMedian(); // return 2.0

Constraints

  • -105 <= num <= 105
  • There will be at least one element in the data structure before calling findMedian.
  • At most 5 * 104 calls will be made to addNum and findMedian.

Solution Approach

Use of Heaps (Priority Queue)

The median can be tracked using two heaps: a max-heap for the lower half of the numbers and a min-heap for the upper half. By maintaining these heaps, we can efficiently find the median in constant time, and add numbers in logarithmic time.

Two-pointer Scanning with Invariant Tracking

A two-pointer technique can be applied to track the dynamic changes in the stream of numbers. As elements are added, we can maintain an invariant where the heaps are balanced, and their root values reflect the median efficiently.

Dynamic Median Calculation

By balancing the heaps dynamically after each addNum operation, the median can be calculated easily by checking the root values of the heaps. If the number of elements is odd, the median is the root of the max-heap. If even, it's the average of the roots of both heaps.

Complexity Analysis

Metric Value
Time Depends on the final approach
Space Depends on the final approach

The time complexity for adding a number is O(log n), where n is the current number of elements in the data stream. Finding the median takes O(1) time as it only involves accessing the root elements of the heaps. The space complexity is O(n), where n is the number of elements in the data structure, due to the storage of numbers in two heaps.

What Interviewers Usually Probe

  • Candidate demonstrates understanding of heap-based solutions for median tracking.
  • Candidate should be able to explain the trade-off of using two heaps versus other approaches.
  • Watch for the candidate's ability to efficiently manage heap balancing and handle edge cases.

Common Pitfalls or Variants

Common pitfalls

  • Failing to properly balance the heaps after adding a number can result in incorrect median calculations.
  • Not handling edge cases where the number of elements is small (e.g., only one element).
  • Overcomplicating the solution by using other data structures that don’t offer efficient median tracking.

Follow-up variants

  • Implementing median calculation with a different data structure like a balanced binary search tree.
  • Optimizing for space complexity by reducing the number of stored elements in memory.
  • Handling a continuous stream of data in real-time with additional constraints.

FAQ

What is the primary data structure used in Find Median from Data Stream?

The primary data structure used is a combination of two heaps: a max-heap for the lower half of the data and a min-heap for the upper half.

How does the two-heap approach ensure efficient median calculation?

The two-heap approach ensures efficient median calculation by maintaining balanced heaps, where the root of each heap provides the required median values.

What is the time complexity for finding the median in this problem?

Finding the median takes O(1) time because it involves accessing the root elements of the heaps.

What is the space complexity of the solution?

The space complexity is O(n) as the solution requires storing the elements in two heaps.

How does GhostInterview help with solving the Find Median from Data Stream problem?

GhostInterview assists by providing step-by-step guidance on efficient median tracking, helping solve the problem within given constraints.

terminal

Solution

Solution 1: Min Heap and Max Heap (Priority Queue)

We can use two heaps to maintain all the elements, a min heap $\textit{minQ}$ and a max heap $\textit{maxQ}$, where the min heap $\textit{minQ}$ stores the larger half, and the max heap $\textit{maxQ}$ stores the smaller half.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class MedianFinder:

    def __init__(self):
        self.minq = []
        self.maxq = []

    def addNum(self, num: int) -> None:
        heappush(self.minq, -heappushpop(self.maxq, -num))
        if len(self.minq) - len(self.maxq) > 1:
            heappush(self.maxq, -heappop(self.minq))

    def findMedian(self) -> float:
        if len(self.minq) == len(self.maxq):
            return (self.minq[0] - self.maxq[0]) / 2
        return self.minq[0]


# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()
Find Median from Data Stream Solution: Two-pointer scanning with invariant t… | LeetCode #295 Hard