LeetCode Problem Workspace

HTML Entity Parser

Transform a string containing HTML entities into their character equivalents using a hash table for efficient parsing.

category

2

Topics

code_blocks

5

Code langs

hub

3

Related

Practice Focus

Medium · Hash Table plus String

bolt

Answer-first summary

Transform a string containing HTML entities into their character equivalents using a hash table for efficient parsing.

Interview AiBox logo

Ace coding interviews with Interview AiBox guidance for Hash Table plus String

Try AiBox Copilotarrow_forward

Start by mapping all standard HTML entities to their corresponding characters in a hash table. Then, iterate through the input string, detecting occurrences of '&' and matching them against the table. Replace recognized entities while leaving unrecognized sequences intact, ensuring accurate and efficient conversion without affecting other text.

Problem Statement

Given a string containing HTML entities, implement a parser that converts all valid entities into their corresponding characters. The parser should recognize standard entities such as ", ', &, >, <, and ⁄.

For input text, replace each valid entity while preserving any unrecognized sequences exactly as they appear. Ensure your parser handles long strings up to 10^5 characters and supports all ASCII characters, optimizing for correct and efficient parsing using a hash table plus string matching approach.

Examples

Example 1

Input: text = "&amp; is an HTML entity but &ambassador; is not."

Output: "& is an HTML entity but &ambassador; is not."

The parser will replace the & entity by &

Example 2

Input: text = "and I quote: &quot;...&quot;"

Output: "and I quote: \"...\""

Example details omitted.

Constraints

  • 1 <= text.length <= 105
  • The string may contain any possible characters out of all the 256 ASCII characters.

Solution Approach

Build an HTML Entity Map

Create a hash table where keys are valid HTML entity strings and values are their corresponding characters. This allows O(1) lookup for each entity, crucial for efficiently parsing large strings with multiple occurrences.

Iterate and Detect Entities

Scan the input string from left to right. Each time you encounter '&', attempt to match the longest possible substring ending with ';' in the hash table. Replace the substring with the mapped character if found; otherwise, keep it unchanged.

Construct the Result String

Append parsed characters or original substrings to a result buffer while iterating. This ensures that the output string maintains original text outside entities and builds efficiently without repeated string concatenation.

Complexity Analysis

Metric Value
Time Depends on the final approach
Space Depends on the final approach

Time complexity depends on string length and number of entities, roughly O(n) with hash table lookups. Space complexity is O(1) for the map of entities plus O(n) for the result string.

What Interviewers Usually Probe

  • Look for a solution that avoids nested loops for each character; the '&' detection is key.
  • Expect the candidate to use a hash table for constant-time entity lookup.
  • Check handling of edge cases where sequences resemble entities but are invalid.

Common Pitfalls or Variants

Common pitfalls

  • Failing to match the longest valid entity when multiple could start at the same '&'.
  • Replacing substrings incorrectly and altering parts of the text that are not valid entities.
  • Not handling large input efficiently, causing O(n^2) behavior due to repeated string concatenation.

Follow-up variants

  • Parse a string with only numeric HTML entities like & or & instead of named entities.
  • Extend the parser to handle nested or repeated entities in a single text segment.
  • Modify the parser to work on a stream of characters instead of a preloaded string, maintaining O(1) lookup.

FAQ

What is the main approach to solve HTML Entity Parser efficiently?

Use a hash table mapping all valid HTML entities to characters, then iterate the string detecting '&' and replacing recognized entities.

How do I handle sequences that look like entities but are invalid?

Leave them unchanged in the output string; only replace entries that exist in the hash table of valid entities.

Can this parser handle very long strings up to 10^5 characters?

Yes, by using a single pass iteration and appending to a buffer, the parser maintains O(n) performance.

Which pattern is primarily used in HTML Entity Parser?

The problem follows the Hash Table plus String pattern, combining constant-time lookup with sequential string parsing.

What is a common mistake when implementing this parser?

Failing to match the longest valid entity at each '&' or altering text outside valid entities can cause incorrect output.

terminal

Solution

Solution 1: Hash Table + Simulation

We can use a hash table to store the corresponding character for each character entity. Then, we traverse the string, and when we encounter a character entity, we replace it with the corresponding character.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Solution:
    def entityParser(self, text: str) -> str:
        d = {
            '&quot;': '"',
            '&apos;': "'",
            '&amp;': "&",
            "&gt;": '>',
            "&lt;": '<',
            "&frasl;": '/',
        }
        i, n = 0, len(text)
        ans = []
        while i < n:
            for l in range(1, 8):
                j = i + l
                if text[i:j] in d:
                    ans.append(d[text[i:j]])
                    i = j
                    break
            else:
                ans.append(text[i])
                i += 1
        return ''.join(ans)

Solution 2

#### TypeScript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Solution:
    def entityParser(self, text: str) -> str:
        d = {
            '&quot;': '"',
            '&apos;': "'",
            '&amp;': "&",
            "&gt;": '>',
            "&lt;": '<',
            "&frasl;": '/',
        }
        i, n = 0, len(text)
        ans = []
        while i < n:
            for l in range(1, 8):
                j = i + l
                if text[i:j] in d:
                    ans.append(d[text[i:j]])
                    i = j
                    break
            else:
                ans.append(text[i])
                i += 1
        return ''.join(ans)
HTML Entity Parser Solution: Hash Table plus String | LeetCode #1410 Medium