LeetCode Problem Workspace

String Compression II

Solve String Compression II with dynamic programming that tracks deletions, run boundaries, and digit-length jumps in compressed counts.

category

2

Topics

code_blocks

1

Code langs

hub

3

Related

Practice Focus

Hard · State transition dynamic programming

bolt

Answer-first summary

Solve String Compression II with dynamic programming that tracks deletions, run boundaries, and digit-length jumps in compressed counts.

Interview AiBox logo

Ace coding interviews with Interview AiBox guidance for State transition dynamic programming

Try AiBox Copilotarrow_forward

String Compression II is a hard dynamic programming problem because deleting one character can change both run grouping and encoded digit length later. The standard solution uses state transition DP to decide whether to delete the current character or keep extending a run, while accounting for count thresholds like 1 to 2, 9 to 10, and 99 to 100 that increase compressed length.

Problem Statement

In String Compression II, you are given a lowercase string and may delete up to k characters before applying run-length encoding. A run contributes the letter plus its count only when the count is at least 2, so a single character stays as just the letter. Your goal is to choose deletions that make the final encoded string as short as possible.

The tricky part is that local deletions can create a much better global merge. In the example "aaabcccd" with k = 2, removing "b" and "d" joins the useful runs into "a3c3" with length 4, while other deletions look reasonable but do not compress as well. That is why this problem is not greedy and instead fits state transition dynamic programming.

Examples

Example 1

Input: s = "aaabcccd", k = 2

Output: 4

Compressing s without deleting anything will give us "a3bc3d" of length 6. Deleting any of the characters 'a' or 'c' would at most decrease the length of the compressed string to 5, for instance delete 2 'a' then we will have s = "abcccd" which compressed is abc3d. Therefore, the optimal way is to delete 'b' and 'd', then the compressed version of s will be "a3c3" of length 4.

Example 2

Input: s = "aabbaa", k = 2

Output: 2

If we delete both 'b' characters, the resulting compressed string would be "a4" of length 2.

Example 3

Input: s = "aaaaaaaaaaa", k = 0

Output: 3

Since k is zero, we cannot delete anything. The compressed string is "a11" of length 3.

Constraints

  • 1 <= s.length <= 100
  • 0 <= k <= s.length
  • s contains only lowercase English letters.

Solution Approach

Define DP by position and remaining deletions

Let dp[i][k] represent the minimum compressed length for the suffix starting at index i when you can still delete k characters. From each state, either delete s[i] and move to dp[i+1][k-1], or keep s[i] and try to build a run of that same character by scanning forward. This captures the core trade-off in String Compression II: spend deletions now to improve a later run boundary.

Build one run while counting required deletions

When you keep s[i], extend a window from i to j and count how many characters match s[i] versus how many must be deleted to make s[i..j] one clean run. If keptCount is the number of matching characters and removed is the number of non-matching characters inside that window, then this choice costs encodeLen(keptCount) plus dp[j+1][k-removed]. This transition is the key pattern because it evaluates every possible run endpoint instead of committing too early.

Handle encoding length jumps correctly

The encoded contribution is 1 for count 1, 2 for counts 2 through 9, 3 for counts 10 through 99, and 4 for count 100. Those thresholds are the main failure mode: adding one more kept character only changes length when the count crosses 1, 9, or 99. A memoized DFS or bottom-up DP both work well here, and with n at most 100, the usual O(n^2k) style solution is fast enough.

Complexity Analysis

Metric Value
Time Depends on the final approach
Space Depends on the final approach

For each index i and remaining deletion budget k, the DP may scan forward through later positions to form a run, so the typical time complexity is O(n^2k). The memo table stores states by index and remaining deletions, giving O(nk) space, plus recursion stack if you implement it with DFS memoization.

What Interviewers Usually Probe

  • They expect dynamic programming as soon as you notice that deleting inside a segment can improve a later merged run.
  • They are checking whether you understand that compression length changes only at count thresholds such as 1, 9, and 99.
  • They want a state definition that avoids storing the whole compressed string and instead measures only the minimum resulting length.

Common Pitfalls or Variants

Common pitfalls

  • Using a greedy rule like deleting the least frequent character first misses cases where removing separators creates a much shorter merged run.
  • Forgetting that a single character contributes length 1, not 2, leads to wrong transitions for small runs.
  • Not charging deletions for non-matching characters inside the current window breaks the run-forming DP transition.

Follow-up variants

  • Return the compressed string itself after at most k deletions instead of only its minimum length.
  • Change the encoding rule so single characters also store a count, which alters every threshold and DP cost.
  • Limit deletions to exactly k characters, which changes base cases and may force harmful removals near the end.

FAQ

What is the main pattern behind String Compression II?

The core pattern is state transition dynamic programming. At each index, you choose between deleting the current character or keeping it and extending a same-character run while paying deletions for mismatches inside that segment.

Why does String Compression II need DP instead of greedy compression?

A greedy choice can look good locally but ruin a better merge later. Deleting separator characters can combine distant runs into one cheaper encoded block, so the best answer depends on future structure.

What are the important count thresholds in this problem?

The compressed length increases when the run count crosses 1, 9, and 99. That means growing a run from 1 to 2, 9 to 10, or 99 to 100 changes the encoded size, while many other count increases do not.

What state should I memoize for LeetCode 1531?

A common memoization state is index plus remaining deletions, written as dp(i, k). From there, scan forward to form a run starting at i, track how many mismatches must be deleted, and combine that run cost with the best answer for the suffix.

What is the usual complexity for an accepted solution?

The standard accepted approach runs in O(n^2k) time and O(nk) space for a string of length n. That fits the constraint of n up to 100 and is why a full run-forming DP is practical here.

terminal

Solution

Solution 1

#### Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class Solution {
    public int getLengthOfOptimalCompression(String s, int k) {
        // dp[i][k] := the length of the optimal compression of s[i..n) with at most
        // k deletion
        dp = new int[s.length()][k + 1];
        Arrays.stream(dp).forEach(A -> Arrays.fill(A, K_MAX));
        return compression(s, 0, k);
    }

    private static final int K_MAX = 101;
    private int[][] dp;

    private int compression(final String s, int i, int k) {
        if (k < 0) {
            return K_MAX;
        }
        if (i == s.length() || s.length() - i <= k) {
            return 0;
        }
        if (dp[i][k] != K_MAX) {
            return dp[i][k];
        }
        int maxFreq = 0;
        int[] count = new int[128];
        // Make letters in s[i..j] be the same.
        // Keep the letter that has the maximum frequency in this range and remove
        // the other letters.
        for (int j = i; j < s.length(); ++j) {
            maxFreq = Math.max(maxFreq, ++count[s.charAt(j)]);
            dp[i][k] = Math.min(
                dp[i][k], getLength(maxFreq) + compression(s, j + 1, k - (j - i + 1 - maxFreq)));
        }
        return dp[i][k];
    }

    // Returns the length to compress `maxFreq`.
    private int getLength(int maxFreq) {
        if (maxFreq == 1) {
            return 1; // c
        }
        if (maxFreq < 10) {
            return 2; // [1-9]c
        }
        if (maxFreq < 100) {
            return 3; // [1-9][0-9]c
        }
        return 4; // [1-9][0-9][0-9]c
    }
}
String Compression II Solution: State transition dynamic programming | LeetCode #1531 Hard