LeetCode Problem Workspace
H-Index
Determine a researcher's h-index by analyzing citations using array sorting and counting techniques efficiently and accurately.
3
Topics
6
Code langs
3
Related
Practice Focus
Medium · Array plus Sorting
Answer-first summary
Determine a researcher's h-index by analyzing citations using array sorting and counting techniques efficiently and accurately.
Ace coding interviews with Interview AiBoxInterview AiBox guidance for Array plus Sorting
To find the h-index, sort the citation array in ascending order and count from the highest citation downward. Track the maximum h such that at least h papers have h or more citations. This approach leverages array sorting and counting patterns to handle citation distributions reliably in linear or log-linear time.
Problem Statement
You are given an integer array citations where citations[i] represents the number of citations for the ith paper of a researcher. Calculate the researcher's h-index, defined as the largest number h such that the researcher has at least h papers with at least h citations each.
For example, given citations = [3,0,6,1,5], the researcher has 5 papers, with citation counts of 3, 0, 6, 1, 5. The h-index is 3 because there are 3 papers with at least 3 citations each, and the remaining two papers have no more than 3 citations each.
Examples
Example 1
Input: citations = [3,0,6,1,5]
Output: 3
[3,0,6,1,5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.
Example 2
Input: citations = [1,3,1]
Output: 1
Example details omitted.
Constraints
- n == citations.length
- 1 <= n <= 5000
- 0 <= citations[i] <= 1000
Solution Approach
Sort and Compare
Sort the citations array in ascending order and iterate from highest to lowest to find the maximum h where citations[i] >= h. This direct approach uses the array plus sorting pattern to simplify counting high-citation papers.
Counting Sort Optimization
Use a counting array to record the frequency of each citation count. Then traverse from the maximum possible citation downward, summing frequencies until the cumulative count meets or exceeds the index. This reduces the problem to linear time and avoids unnecessary sorting overhead.
Binary Search on Sorted Array
After sorting citations ascendingly, apply binary search to find the smallest index where citations[index] >= n - index, with n as the array length. This approach efficiently leverages the sorted array structure and is ideal when handling large datasets.
Complexity Analysis
| Metric | Value |
|---|---|
| Time | Depends on the final approach |
| Space | Depends on the final approach |
Sorting takes O(n log n) time with O(1) extra space if in-place, while counting sort provides O(n + m) time with O(m) space where m is the max citation. Binary search adds O(log n) to the sorting step. Space complexity depends on chosen approach, either O(1) or O(n) for auxiliary arrays.
What Interviewers Usually Probe
- Expect an initial sort-based solution and probe understanding of h-index properties.
- Clarify whether counting sort optimization is feasible given citation constraints.
- Check if candidate can handle edge cases like all zeros or all identical citation counts.
Common Pitfalls or Variants
Common pitfalls
- Miscounting papers with exactly h citations or assuming strict greater-than.
- Failing to consider empty or single-element arrays.
- Ignoring the trade-off between sorting and linear counting when max citation is small.
Follow-up variants
- Compute h-index when citations are provided in descending order without sorting.
- Handle streaming citation updates and dynamically maintain the h-index.
- Calculate h-index for multiple researchers simultaneously using counting arrays.
FAQ
What is the H-Index problem pattern on LeetCode?
It follows the Array plus Sorting pattern, where you sort citation counts or use counting arrays to compute the maximum h-index.
Can I solve H-Index without sorting?
Yes, using a counting sort or bucket array approach lets you determine the h-index in linear time without a full sort.
What edge cases should I consider?
Empty arrays, all zeros, all papers with the same citation count, or citations exceeding array length are key cases to handle.
What is the time complexity using counting sort?
Counting sort approach runs in O(n + maxCitation) time, which is linear if maxCitation is reasonable compared to n.
How do I verify my h-index result is correct?
Check that at least h papers have at least h citations and the remaining papers have no more than h citations to confirm correctness.
Solution
Solution 1: Sorting
We can sort the array `citations` in descending order. Then we enumerate the value $h$ from large to small, if there is an $h$ value satisfying $citations[h-1] \geq h$, it means that there are at least $h$ papers that have been cited at least $h$ times, just return $h$ directly. If we cannot find such an $h$ value, it means that all the papers have not been cited, return $0$.
class Solution:
def hIndex(self, citations: List[int]) -> int:
citations.sort(reverse=True)
for h in range(len(citations), 0, -1):
if citations[h - 1] >= h:
return h
return 0Solution 2: Counting + Sum
We can use an array $cnt$ of length $n+1$, where $cnt[i]$ represents the number of papers with the reference count of $i$. We traverse the array `citations` and treat the papers with the reference count greater than $n$ as papers with a reference count of $n$. Then we use the reference count as the index and add $1$ to the corresponding element of $cnt$ for each paper. In this way, we have counted the number of papers for each reference count.
class Solution:
def hIndex(self, citations: List[int]) -> int:
citations.sort(reverse=True)
for h in range(len(citations), 0, -1):
if citations[h - 1] >= h:
return h
return 0Solution 3: Binary Search
We notice that if there is a $h$ value that satisfies at least $h$ papers are cited at least $h$ times, then for any $h'<h$, at least $h'$ papers are cited at least $h'$ times. Therefore, we can use the binary search method to find the largest $h$ such that at least $h$ papers are cited at least $h$ times.
class Solution:
def hIndex(self, citations: List[int]) -> int:
citations.sort(reverse=True)
for h in range(len(citations), 0, -1):
if citations[h - 1] >= h:
return h
return 0Continue Practicing
Continue Topic
array
Practice more edge cases under the same topic.
arrow_forwardauto_awesomeContinue Pattern
Array plus Sorting
Expand the same solving frame across more problems.
arrow_forwardsignal_cellular_altSame Difficulty Track
Medium
Stay on this level to stabilize interview delivery.
arrow_forward