LeetCode Problem Workspace
Find Median from Data Stream
Implement a MedianFinder class that supports adding numbers and finding the median from a data stream.
5
Topics
9
Code langs
3
Related
Practice Focus
Hard · Two-pointer scanning with invariant tracking
Answer-first summary
Implement a MedianFinder class that supports adding numbers and finding the median from a data stream.
Ace coding interviews with Interview AiBoxInterview AiBox guidance for Two-pointer scanning with invariant tracking
The Find Median from Data Stream problem requires implementing a MedianFinder class that dynamically calculates the median from a stream of integers. The goal is to design an efficient solution with operations like addNum and findMedian, which should work well within the given constraints.
Problem Statement
In this problem, you are tasked with implementing a class called MedianFinder, which should be able to dynamically calculate the median of numbers as they are added to a data stream. The median is the middle value of a sorted list, or the average of the two middle values if the list has an even number of elements. Your solution should efficiently support operations that add a number to the stream and compute the median in real-time.
You are given two operations: addNum(int num), which adds an integer to the data stream, and findMedian(), which returns the current median of all numbers added so far. The challenge is to design an algorithm that performs these operations efficiently within the constraints of at least one element in the data structure before calling findMedian, and a maximum of 50,000 calls to addNum and findMedian.
Examples
Example 1
Input: See original problem statement.
Output: See original problem statement.
Input ["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"] [[], [1], [2], [], [3], []] Output [null, null, null, 1.5, null, 2.0]
Explanation MedianFinder medianFinder = new MedianFinder(); medianFinder.addNum(1); // arr = [1] medianFinder.addNum(2); // arr = [1, 2] medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2) medianFinder.addNum(3); // arr[1, 2, 3] medianFinder.findMedian(); // return 2.0
Constraints
- -105 <= num <= 105
- There will be at least one element in the data structure before calling findMedian.
- At most 5 * 104 calls will be made to addNum and findMedian.
Solution Approach
Use of Heaps (Priority Queue)
The median can be tracked using two heaps: a max-heap for the lower half of the numbers and a min-heap for the upper half. By maintaining these heaps, we can efficiently find the median in constant time, and add numbers in logarithmic time.
Two-pointer Scanning with Invariant Tracking
A two-pointer technique can be applied to track the dynamic changes in the stream of numbers. As elements are added, we can maintain an invariant where the heaps are balanced, and their root values reflect the median efficiently.
Dynamic Median Calculation
By balancing the heaps dynamically after each addNum operation, the median can be calculated easily by checking the root values of the heaps. If the number of elements is odd, the median is the root of the max-heap. If even, it's the average of the roots of both heaps.
Complexity Analysis
| Metric | Value |
|---|---|
| Time | Depends on the final approach |
| Space | Depends on the final approach |
The time complexity for adding a number is O(log n), where n is the current number of elements in the data stream. Finding the median takes O(1) time as it only involves accessing the root elements of the heaps. The space complexity is O(n), where n is the number of elements in the data structure, due to the storage of numbers in two heaps.
What Interviewers Usually Probe
- Candidate demonstrates understanding of heap-based solutions for median tracking.
- Candidate should be able to explain the trade-off of using two heaps versus other approaches.
- Watch for the candidate's ability to efficiently manage heap balancing and handle edge cases.
Common Pitfalls or Variants
Common pitfalls
- Failing to properly balance the heaps after adding a number can result in incorrect median calculations.
- Not handling edge cases where the number of elements is small (e.g., only one element).
- Overcomplicating the solution by using other data structures that don’t offer efficient median tracking.
Follow-up variants
- Implementing median calculation with a different data structure like a balanced binary search tree.
- Optimizing for space complexity by reducing the number of stored elements in memory.
- Handling a continuous stream of data in real-time with additional constraints.
FAQ
What is the primary data structure used in Find Median from Data Stream?
The primary data structure used is a combination of two heaps: a max-heap for the lower half of the data and a min-heap for the upper half.
How does the two-heap approach ensure efficient median calculation?
The two-heap approach ensures efficient median calculation by maintaining balanced heaps, where the root of each heap provides the required median values.
What is the time complexity for finding the median in this problem?
Finding the median takes O(1) time because it involves accessing the root elements of the heaps.
What is the space complexity of the solution?
The space complexity is O(n) as the solution requires storing the elements in two heaps.
How does GhostInterview help with solving the Find Median from Data Stream problem?
GhostInterview assists by providing step-by-step guidance on efficient median tracking, helping solve the problem within given constraints.
Solution
Solution 1: Min Heap and Max Heap (Priority Queue)
We can use two heaps to maintain all the elements, a min heap $\textit{minQ}$ and a max heap $\textit{maxQ}$, where the min heap $\textit{minQ}$ stores the larger half, and the max heap $\textit{maxQ}$ stores the smaller half.
class MedianFinder:
def __init__(self):
self.minq = []
self.maxq = []
def addNum(self, num: int) -> None:
heappush(self.minq, -heappushpop(self.maxq, -num))
if len(self.minq) - len(self.maxq) > 1:
heappush(self.maxq, -heappop(self.minq))
def findMedian(self) -> float:
if len(self.minq) == len(self.maxq):
return (self.minq[0] - self.maxq[0]) / 2
return self.minq[0]
# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()Continue Topic
two pointers
Practice more edge cases under the same topic.
arrow_forwardauto_awesomeContinue Pattern
Two-pointer scanning with invariant tracking
Expand the same solving frame across more problems.
arrow_forwardsignal_cellular_altSame Difficulty Track
Hard
Stay on this level to stabilize interview delivery.
arrow_forward