LeetCode Problem Workspace

Statistics from a Large Sample

Calculate minimum, maximum, mean, median, and mode from a large sample represented by an array of counts.

category

3

Topics

code_blocks

5

Code langs

hub

3

Related

Practice Focus

Medium · Array plus Math

bolt

Answer-first summary

Calculate minimum, maximum, mean, median, and mode from a large sample represented by an array of counts.

Interview AiBox logo

Ace coding interviews with Interview AiBox guidance for Array plus Math

Try AiBox Copilotarrow_forward

This problem requires calculating statistical measures from a sample represented by a count array. You need to determine the minimum, maximum, mean, median, and mode based on the array's values. The main challenge is calculating the median efficiently given the sample size and distribution.

Problem Statement

You are given a large sample of integers in the range [0, 255]. Instead of directly storing the integers, the sample is represented by an array count where count[k] is the number of times that the integer k appears in the sample. The task is to compute various statistical measures based on this representation.

Return the following statistics as an array of floating-point numbers: [minimum, maximum, mean, median, mode]. The solution must work within a 10^-5 tolerance for accuracy. Your implementation should handle the large sample size efficiently, with particular attention to calculating the median.

Examples

Example 1

Input: count = [0,1,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

Output: [1.00000,3.00000,2.37500,2.50000,3.00000]

The sample represented by count is [1,2,2,2,3,3,3,3]. The minimum and maximum are 1 and 3 respectively. The mean is (1+2+2+2+3+3+3+3) / 8 = 19 / 8 = 2.375. Since the size of the sample is even, the median is the average of the two middle elements 2 and 3, which is 2.5. The mode is 3 as it appears the most in the sample.

Example 2

Input: count = [0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

Output: [1.00000,4.00000,2.18182,2.00000,1.00000]

The sample represented by count is [1,1,1,1,2,2,2,3,3,4,4]. The minimum and maximum are 1 and 4 respectively. The mean is (1+1+1+1+2+2+2+3+3+4+4) / 11 = 24 / 11 = 2.18181818... (for display purposes, the output shows the rounded number 2.18182). Since the size of the sample is odd, the median is the middle element 2. The mode is 1 as it appears the most in the sample.

Constraints

  • count.length == 256
  • 0 <= count[i] <= 109
  • 1 <= sum(count) <= 109
  • The mode of the sample that count represents is unique.

Solution Approach

Mean Calculation

To calculate the mean, iterate through the count array. For each index k, multiply k by the value of count[k] and sum these values. Divide the sum by the total count of elements in the sample.

Median Calculation

The median is the middle value of the sorted sample. If the sample size is odd, the median is the middle element. If it’s even, the median is the average of the two middle elements. To find the median efficiently, traverse the count array and identify the position of the middle element(s).

Mode Calculation

The mode is the most frequent value in the sample. Find the index k that has the highest count value in the count array. This will be the mode.

Complexity Analysis

Metric Value
Time Depends on the final approach
Space Depends on the final approach

The time complexity of this solution depends on the method used for median calculation, but typically it is O(n) where n is the length of the count array (256 in this case). The space complexity is O(1) because we only need a few variables to keep track of statistics, regardless of the input size.

What Interviewers Usually Probe

  • Can the candidate efficiently compute the median from a large sample?
  • Does the candidate handle large inputs and edge cases correctly?
  • Is the solution optimized for both time and space complexity?

Common Pitfalls or Variants

Common pitfalls

  • Miscalculating the median by not properly handling both odd and even sample sizes.
  • Forgetting to properly sum the elements in the count array to calculate the mean.
  • Mistaking the mode for another statistic (e.g., assuming it's the maximum value instead of the most frequent).

Follow-up variants

  • Handling edge cases like a sample with only one unique value or a very small sample size.
  • Modifying the problem to calculate only a subset of the statistics, e.g., just the mean and mode.
  • Optimizing for large-scale datasets where memory and time efficiency become critical.

FAQ

What is the time complexity of the 'Statistics from a Large Sample' problem?

The time complexity typically depends on the approach used for median calculation, but a good approach runs in O(n), where n is the length of the count array (256).

How can I calculate the median in this problem?

The median is found by either selecting the middle element directly if the sample size is odd or averaging the two middle elements if the sample size is even.

How should I handle large input sizes for this problem?

Focus on optimizing both time and space complexity. Ensure that your solution handles the large input sizes without excessive memory usage or slow execution times.

What is the most common mistake in this problem?

The most common mistake is miscalculating the median, especially for even-sized samples, or failing to properly sum the elements for the mean.

How do I find the mode in this problem?

The mode is the most frequent number in the sample. Traverse the count array and identify the index with the highest value, which corresponds to the mode.

terminal

Solution

Solution 1

#### Python3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Solution:
    def sampleStats(self, count: List[int]) -> List[float]:
        def find(i: int) -> int:
            t = 0
            for k, x in enumerate(count):
                t += x
                if t >= i:
                    return k

        mi, mx = inf, -1
        s = cnt = 0
        mode = 0
        for k, x in enumerate(count):
            if x:
                mi = min(mi, k)
                mx = max(mx, k)
                s += k * x
                cnt += x
                if x > count[mode]:
                    mode = k

        median = (
            find(cnt // 2 + 1) if cnt & 1 else (find(cnt // 2) + find(cnt // 2 + 1)) / 2
        )
        return [mi, mx, s / cnt, median, mode]
Statistics from a Large Sample Solution: Array plus Math | LeetCode #1093 Medium