How does binary search help in solving the Tweet Counts Per Frequency problem?

Binary search helps by narrowing down the possible answer space for tweet counts within a specific time period, minimizing unnecessary checks.

What is the best way to store tweet timestamps for efficient retrieval?

The best way is to use a hash table, where tweet names are keys, and their corresponding timestamps are stored in ordered lists for fast access.

How do time intervals impact the solution's complexity?

The partitioning of time intervals into smaller chunks based on frequency can increase the complexity if not handled efficiently, requiring careful attention to how binary search and data structures are used.

What are the main challenges when implementing this problem?

The main challenges include correctly handling time chunking based on the frequency, efficiently counting tweets using binary search, and managing high numbers of tweets and function calls.

How does GhostInterview assist with solving this problem?

GhostInterview provides a structured approach, breaking down the binary search, hash table usage, and time chunk handling, helping you focus on optimal solutions.

#1348

Medium

auto_awesome二分·搜索·答案·空间

LeetCode 题解工作台

推文计数

一家社交媒体公司正试图通过分析特定时间段内出现的推文数量来监控其网站上的活动。这些时间段可以根据特定的频率（每分钟、每小时或每一天）划分为更小的时间段。例如，周期 [10,10000] （以秒为单位）将被划分为以下频率的时间块 : 每分钟 (60秒块)： [10,69] …

哈希表二分查找设计排序有序集合

题目描述

一家社交媒体公司正试图通过分析特定时间段内出现的推文数量来监控其网站上的活动。这些时间段可以根据特定的频率（ 每分钟 、每小时 或 每一天 ）划分为更小的 时间段 。

例如，周期 [10,10000] （以秒为单位）将被划分为以下频率的 时间块 :

每分钟 (60秒块)： [10,69], [70,129], [130,189], ..., [9970,10000]
每小时 (3600秒块)：[10,3609], [3610,7209], [7210,10000]
每天 (86400秒块)： [10,10000]

注意，最后一个块可能比指定频率的块大小更短，并且总是以时间段的结束时间结束(在上面的示例中为 10000 )。

设计和实现一个API来帮助公司进行分析。

实现 TweetCounts 类:

TweetCounts() 初始化 TweetCounts 对象。
存储记录时间的 tweetName (以秒为单位)。
List<integer> getTweetCountsPerFrequency(String freq, String tweetName, int startTime, int endTime) 返回一个整数列表，表示给定时间 [startTime, endTime] （单位秒）和频率频率中，每个 时间块 中带有 tweetName 的 tweet 的数量。
- freq 是 “minute” 、 “hour” 或 “day” 中的一个，分别表示 每分钟 、 每小时 或 每一天 的频率。

示例：

输入：
["TweetCounts","recordTweet","recordTweet","recordTweet","getTweetCountsPerFrequency","getTweetCountsPerFrequency","recordTweet","getTweetCountsPerFrequency"]
[[],["tweet3",0],["tweet3",60],["tweet3",10],["minute","tweet3",0,59],["minute","tweet3",0,60],["tweet3",120],["hour","tweet3",0,210]]

输出：
[null,null,null,null,[2],[2,1],null,[4]]

解释：
TweetCounts tweetCounts = new TweetCounts();
tweetCounts.recordTweet("tweet3", 0);
tweetCounts.recordTweet("tweet3", 60);
tweetCounts.recordTweet("tweet3", 10);                             // "tweet3" 发布推文的时间分别是 0, 10 和 60 。
tweetCounts.getTweetCountsPerFrequency("minute", "tweet3", 0, 59); // 返回 [2]。统计频率是每分钟（60 秒），因此只有一个有效时间间隔 [0,60> - > 2 条推文。
tweetCounts.getTweetCountsPerFrequency("minute", "tweet3", 0, 60); // 返回 [2,1]。统计频率是每分钟（60 秒），因此有两个有效时间间隔 1) [0,60> - > 2 条推文，和 2) [60,61> - > 1 条推文。 
tweetCounts.recordTweet("tweet3", 120);                            // "tweet3" 发布推文的时间分别是 0, 10, 60 和 120 。
tweetCounts.getTweetCountsPerFrequency("hour", "tweet3", 0, 210);  // 返回 [4]。统计频率是每小时（3600 秒），因此只有一个有效时间间隔 [0,211> - > 4 条推文。

提示：

0 <= time, startTime, endTime <= 10⁹
0 <= endTime - startTime <= 10⁴
recordTweet 和 getTweetCountsPerFrequency，最多有 10⁴ 次操作。

lightbulb

解题思路

方法一：哈希表 + 有序列表

我们用哈希表 data 记录每个用户的推文时间，用有序列表记录每个用户的所有推文时间。

对于 recordTweet 操作，我们将推文时间加入到用户的推文时间列表中。

对于 getTweetCountsPerFrequency 操作，我们先计算出时间间隔 f，然后遍历用户的推文时间列表，统计每个时间间隔内的推文数量。

时间复杂度，对于 recordTweet 操作，总的时间复杂度 $O(n \times \log n)$ ；对于 getTweetCountsPerFrequency 操作，总的时间复杂度 $O(q \times (t + \log n))$ 。其中 $n$ , $q$ 和 $t$ 分别表示插入的推文数量，查询的次数和时间间隔的长度。

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

class TweetCounts:
    def __init__(self):
        self.d = {"minute": 60, "hour": 3600, "day": 86400}
        self.data = defaultdict(SortedList)

    def recordTweet(self, tweetName: str, time: int) -> None:
        self.data[tweetName].add(time)

    def getTweetCountsPerFrequency(
        self, freq: str, tweetName: str, startTime: int, endTime: int
    ) -> List[int]:
        f = self.d[freq]
        tweets = self.data[tweetName]
        t = startTime
        ans = []
        while t <= endTime:
            l = tweets.bisect_left(t)
            r = tweets.bisect_left(min(t + f, endTime + 1))
            ans.append(r - l)
            t += f
        return ans


# Your TweetCounts object will be instantiated and called as such:
# obj = TweetCounts()
# obj.recordTweet(tweetName,time)
# param_2 = obj.getTweetCountsPerFrequency(freq,tweetName,startTime,endTime)

speed

复杂度分析

指标	值
时间	Depends on the final approach
空间	Depends on the final approach

psychology

面试官常问的追问

外企场景

question_mark
Understanding of binary search over a range of values and its application to time intervals.
question_mark
Ability to design a system with hash tables for efficient data storage and retrieval.
question_mark
Clear grasp of how to partition time periods into chunks and handle edge cases (e.g., last chunk being smaller).

warning

常见陷阱

外企场景

error
Failure to properly partition time ranges based on the given frequency, leading to incorrect counts.
error
Inefficient handling of large data sets or too many function calls, resulting in high time complexity.
error
Incorrect or inefficient implementation of binary search for counting tweets in a given time period.

swap_horiz

进阶变体

外企场景

arrow_right_alt
Different time frequencies (minute, hour, day) can change the complexity of the chunking and binary search logic.
arrow_right_alt
Handling extremely large numbers of timestamps or a high number of function calls may require additional optimizations.
arrow_right_alt
Designing the system for a dynamic range of time periods rather than fixed ones, which may require extra handling of edge cases.

help

常见问题

外企场景

继续练习

#2251 花期内花的数目 #1346 检查整数及其两倍数是否存在 #1418 点菜展示表