What is the main strategy for Online Majority Element In Subarray?

The main strategy is to preprocess element positions and use binary search or segment trees to efficiently verify potential majority elements per query.

Can we solve this problem without extra space?

Not efficiently; tracking element occurrences with maps or trees is essential to support multiple fast queries.

Why is binary search over candidate positions used?

Binary search allows counting occurrences of a candidate in a subarray in logarithmic time, avoiding linear scans.

How does the threshold affect candidate selection?

Only elements that appear at least threshold times in the subarray are valid; all others are discarded during the query check.

Is random sampling a reliable approach here?

Yes, for large arrays it can efficiently detect the majority element with high probability, especially when combined with position maps.

#1157

Hard

auto_awesome二分·搜索·答案·空间

LeetCode 题解工作台

子数组中占绝大多数的元素

设计一个数据结构，有效地找到给定子数组的多数元素。子数组的多数元素是在子数组中出现 threshold 次数或次数以上的元素。实现 MajorityChecker 类: MajorityChecker(int[] arr) 会用给定的数组 arr 对 MajorityChecker 初始…

数组二分查找设计树状数组线段树

题目描述

设计一个数据结构，有效地找到给定子数组的 多数元素 。

子数组的 多数元素 是在子数组中出现 threshold 次数或次数以上的元素。

实现 MajorityChecker 类:

MajorityChecker(int[] arr) 会用给定的数组 arr 对 MajorityChecker 初始化。
int query(int left, int right, int threshold) 返回子数组中的元素 arr[left...right] 至少出现 threshold 次数，如果不存在这样的元素则返回 -1。

示例 1：

输入:
["MajorityChecker", "query", "query", "query"]
[[[1, 1, 2, 2, 1, 1]], [0, 5, 4], [0, 3, 3], [2, 3, 2]]
输出：
[null, 1, -1, 2]

解释：
MajorityChecker majorityChecker = new MajorityChecker([1,1,2,2,1,1]);
majorityChecker.query(0,5,4); // 返回 1
majorityChecker.query(0,3,3); // 返回 -1
majorityChecker.query(2,3,2); // 返回 2

提示：

1 <= arr.length <= 2 * 10⁴
1 <= arr[i] <= 2 * 10⁴
0 <= left <= right < arr.length
threshold <= right - left + 1
2 * threshold > right - left + 1
调用 query 的次数最多为 10⁴

lightbulb

解题思路

方法一：线段树 + 摩尔投票 + 二分查找

我们注意到，题目需要我们找出特定区间内可能的众数，考虑使用线段树来维护每个区间内的候选众数以及其出现的次数。

我们定义线段树的每个节点为 Node，每个节点包含如下属性：

l：节点的左端点，下标从 $1$ 开始。
r：节点的右端点，下标从 $1$ 开始。
x：节点的候选众数。
cnt：节点的候选众数出现的次数。

线段树主要有以下几个操作：

build(u, l, r)：建立线段树。
pushup(u)：用子节点的信息更新父节点的信息。
query(u, l, r)：查询区间和。

在主函数的初始化方法中，我们先创建一个线段树，并且用哈希表 $d$ 记录每个元素在数组中的所有下标。

在 query(left, right, threshold) 方法中，我们直接调用线段树的 query 方法，得到候选众数 $x$ 。然后使用二分查找，找到 $x$ 在数组中第一个大于等于 $left$ 的下标 $l$ ，以及第一个大于 $right$ 的下标 $r$ 。如果 $r - l \ge threshold$ ，则返回 $x$ ，否则返回 $-1$ 。

时间复杂度方面，初始化方法的时间复杂度为 $O(n)$ ，查询方法的时间复杂度为 $O(\log n)$ 。空间复杂度为 $O(n)$ 。其中 $n$ 为数组的长度。

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

class Node:
    __slots__ = ("l", "r", "x", "cnt")

    def __init__(self):
        self.l = self.r = 0
        self.x = self.cnt = 0


class SegmentTree:
    def __init__(self, nums):
        self.nums = nums
        n = len(nums)
        self.tr = [Node() for _ in range(n << 2)]
        self.build(1, 1, n)

    def build(self, u, l, r):
        self.tr[u].l, self.tr[u].r = l, r
        if l == r:
            self.tr[u].x = self.nums[l - 1]
            self.tr[u].cnt = 1
            return
        mid = (l + r) >> 1
        self.build(u << 1, l, mid)
        self.build(u << 1 | 1, mid + 1, r)
        self.pushup(u)

    def query(self, u, l, r):
        if self.tr[u].l >= l and self.tr[u].r <= r:
            return self.tr[u].x, self.tr[u].cnt
        mid = (self.tr[u].l + self.tr[u].r) >> 1
        if r <= mid:
            return self.query(u << 1, l, r)
        if l > mid:
            return self.query(u << 1 | 1, l, r)
        x1, cnt1 = self.query(u << 1, l, r)
        x2, cnt2 = self.query(u << 1 | 1, l, r)
        if x1 == x2:
            return x1, cnt1 + cnt2
        if cnt1 >= cnt2:
            return x1, cnt1 - cnt2
        else:
            return x2, cnt2 - cnt1

    def pushup(self, u):
        if self.tr[u << 1].x == self.tr[u << 1 | 1].x:
            self.tr[u].x = self.tr[u << 1].x
            self.tr[u].cnt = self.tr[u << 1].cnt + self.tr[u << 1 | 1].cnt
        elif self.tr[u << 1].cnt >= self.tr[u << 1 | 1].cnt:
            self.tr[u].x = self.tr[u << 1].x
            self.tr[u].cnt = self.tr[u << 1].cnt - self.tr[u << 1 | 1].cnt
        else:
            self.tr[u].x = self.tr[u << 1 | 1].x
            self.tr[u].cnt = self.tr[u << 1 | 1].cnt - self.tr[u << 1].cnt


class MajorityChecker:
    def __init__(self, arr: List[int]):
        self.tree = SegmentTree(arr)
        self.d = defaultdict(list)
        for i, x in enumerate(arr):
            self.d[x].append(i)

    def query(self, left: int, right: int, threshold: int) -> int:
        x, _ = self.tree.query(1, left + 1, right + 1)
        l = bisect_left(self.d[x], left)
        r = bisect_left(self.d[x], right + 1)
        return x if r - l >= threshold else -1


# Your MajorityChecker object will be instantiated and called as such:
# obj = MajorityChecker(arr)
# param_1 = obj.query(left,right,threshold)

speed

复杂度分析

指标	值
时间	and space complexity depend on the approach: frequency maps with binary search give O(k log n) per query, segment trees give O(log n) per query, and random sampling achieves expected O(log n) with extra space for index maps.
空间	Depends on the final approach

psychology

面试官常问的追问

外企场景

question_mark
Can you handle multiple online queries efficiently without scanning each subarray?
question_mark
What data structures allow you to count element frequencies quickly within a range?
question_mark
How can binary search be applied over the candidate element positions to validate thresholds?

warning

常见陷阱

外企场景

error
Assuming a linear scan per query is fast enough for large arrays.
error
Forgetting that the majority element must meet the threshold, not just be most frequent.
error
Not handling the case when no element satisfies the threshold and returning -1 incorrectly.

swap_horiz

进阶变体

外企场景

arrow_right_alt
Find the majority element for dynamic subarrays where elements can be updated between queries.
arrow_right_alt
Return all elements that appear more than a fraction of the subarray size rather than a fixed threshold.
arrow_right_alt
Optimize for streaming input where queries arrive in real time and the array is too large to store completely.

help

常见问题

外企场景

继续练习

#731 我的日程安排表 II #729 我的日程安排表 I #1649 通过指令创建有序数组