Ace every interview with Interview AiBoxInterview AiBox real-time AI assistant
Why Naive Chunking Breaks RAG: How to Explain Chunk Strategy in Interviews
Many candidates say they chunk by fixed length, but cannot explain headings, tables, lists, cross-page paragraphs, or overlap. This guide shows how to talk about chunking strategy like someone who has actually built it.
- sellTechnical Deep Dive
- sellProduct Updates
One of the fastest ways to sound shallow in a RAG interview is to answer this question too quickly:
How do you chunk your documents?
At first glance, it sounds like a small implementation detail.
But the moment the interviewer pushes further, it becomes a much deeper test:
- What happens when a heading is separated from its body?
- What if a table gets cut in half?
- What if list items lose the leading sentence that gives them meaning?
- Why is your overlap set to that value?
- How do you know your chunks are actually answerable units?
That is why so many people start strong with:
We use fixed-size chunking at 512 tokens.
and then lose control of the answer.
Chunking is never only about splitting long text into smaller pieces. It decides:
- whether retrieval returns complete meaning or fragments
- whether the LLM receives answerable evidence or broken context
- whether your knowledge base behaves like a useful system or a noisy archive
For Interview AiBox, chunk quality matters even more because users face follow-up pressure, not just one-off questions. Broken chunks usually become obvious by the second or third turn.
Start with the real principle: chunk answerable units, not just equal sizes
Why fixed-length chunking often falls short
Fixed-length chunking has obvious advantages:
- simple
- fast
- easy to implement
That makes it a perfectly reasonable baseline.
The problem is that it has no idea where the semantic boundaries are.
If a policy paragraph gets split exactly at "the following cases are excluded," retrieval may return only the coverage half while the exclusion half lands in the next chunk.
That means:
- retrieval appears to hit
- generation still sounds fluent
- the answer is directionally wrong
In interview settings, that gets worse because users ask follow-up questions:
- why was it designed this way?
- what are the exceptions?
- what trade-offs did you consider?
If the context was fragmented early, later turns become noticeably weaker.
Strong chunking strategies usually start with document structure
Many people begin the explanation with token counts.
In real systems, structure is often the more important starting point.
Documents already contain natural semantic boundaries:
- headings
- subheadings
- paragraphs
- lists
- tables
- code blocks
- question-answer pairs
If those boundaries are preserved, retrieval and answer generation become much more stable.
So a stronger explanation usually sounds like this:
1. Detect structure first
First identify:
- headings
- body paragraphs
- list items
- tables and code blocks
The goal is not visual formatting. The goal is to identify the places that should not be split casually.
2. Split by semantic unit second
If one section is short enough, keep it intact.
If it is too long, look for subheadings, paragraph boundaries, or list groupings before splitting further.
3. Balance length last
Only after semantic integrity is protected do you optimize for chunk size and context-window fit.
That order matters.
Headings, lists, and tables are where interviewers usually push hardest
Interviewers like these cases because they quickly reveal whether you have handled real-world documents.
1. Headings should not become orphan chunks
When a heading gets separated from its body, the body chunk loses an important layer of context.
If "Travel Reimbursement" is detached from the paragraph that says "hotel budget is capped at 500," the body chunk becomes harder to retrieve and harder to interpret correctly.
That affects:
- retrieval matching
- rerank judgment
- answer traceability
So strong systems usually let headings travel with their content rather than becoming tiny standalone chunks.
2. List items often lose meaning without the lead sentence
For example:
The following cases are excluded:
- war
- nuclear radiation
- nuclear explosion
If every list item becomes an independent chunk, the system preserves the terms but loses the meaning that these are exclusions.
That is why list lead-ins often need to stay attached to their items, or at least be inherited by each subchunk.
3. Tables are about relationships, not just text density
When tables are chunked badly, what gets lost is not a few tokens. It is the relationship between fields.
For an LLM, a flattened line like "Plan A 500000 5000 Plan B 300000 3000" is far less useful than a representation that still preserves column meaning.
So table handling usually focuses on:
- keeping headers when possible
- preserving row-column relationships
- ensuring that each split table part still contains enough context to be interpreted
Overlap is not better when it is larger. It is better when it respects boundaries.
Many people hear overlap and assume more is safer.
That is not the right mental model.
Overlap exists to preserve the continuity most likely to be lost across chunk boundaries.
If overlap is too small, continuity breaks.
If overlap is too large, new problems appear:
- too much duplicated information
- larger storage and retrieval cost
- more confusion for rerank because similar chunks start looking nearly identical
So the stronger explanation is not:
We use 100 tokens of overlap.
It is:
We keep enough overlap to preserve continuity, but we try to align it with complete sentences, lead-in phrases, or clean semantic boundaries rather than cutting through the middle of text mechanically.
That sounds much closer to real engineering judgment.
Why we care more about answerable units than standard-size chunks
In Interview AiBox, chunk value is not measured by neatness. It is measured by whether the chunk can support an answer.
A useful chunk usually has several qualities:
- one clear topic
- high self-containment
- enough heading or source context to stand on its own
- limited dependence on many adjacent chunks
- direct usefulness as answer evidence once retrieved
That is why we think of chunks as answerable units.
Examples include:
- one complete project story
- one complete Q&A pair
- one meaningful table segment
- one coherent system-design submodule explanation
When chunks are answerable on their own, the retrieval and generation layers carry much less burden.
Chunk metadata often determines the real upper bound
Many teams treat a chunk as text plus vector and stop there.
That is rarely enough.
In practice, a chunk often needs metadata that helps with:
- filtering
- reranking
- source tracing
- adjacent-context retrieval
Common fields include:
- document ID
- page or section number
- section path
- content type
- previous and next chunk references
- whether the chunk represents key policy, key project evidence, or high-value material
This matters a lot in interview products because users often need not just "related content," but:
- the newest version
- the version tailored to a certain role
- stronger evidence of results
- chunks that are better suited for follow-up answers
Without metadata, many of those controls cannot be added later in the online stage.
How to explain chunking in interviews without sounding generic
If the interviewer asks:
How do you chunk documents in your RAG system?
A weak answer is:
We chunk at 512 tokens with some overlap.
A stronger answer usually follows this structure:
1. Explain why fixed-length alone is not enough
It does not understand natural structures like headings, tables, lists, or cross-page paragraphs, so it can break one answerable unit into multiple fragments.
2. Explain the main strategy
Start with structure detection, split by semantic units such as sections, lists, tables, or Q&A pairs, then do length balancing and overlap as the last step.
3. Explain how special elements are handled
Headings stay attached to bodies, list items keep their lead-in meaning, and tables preserve headers and field relationships where possible.
4. Explain why this matches the product goal
The goal is not to produce beautifully uniform chunks. The goal is to retrieve evidence that can actually support an answer under follow-up pressure.
If you add one more sentence like this:
In Interview AiBox, we optimize for answerable units because users are not asking one-off FAQ questions. They are navigating multi-turn interview pressure.
the answer starts sounding much more like product experience than theory.
FAQ
Is fixed-length chunking always bad?
No. It is a reasonable baseline, especially when the document structure is simple. But complex layouts, tables, and follow-up-heavy scenarios usually need stronger structure awareness.
What is the right overlap size?
There is no universal number. The key is preserving continuity without creating too much duplicated information.
Why do interviewers ask about chunking so often?
Because chunking is one of the fastest ways to tell whether someone has worked with real knowledge bases. Saying "512 tokens" is rarely enough once headings, tables, lists, and edge cases enter the conversation.
Next Steps
- For the broader knowledge foundation, read Our Interview-Grade RAG Architecture
- To see how chunking affects recall quality directly, read Knowledge Base Recall Quality Improved by 93%
- To understand why not every query should enter retrieval, read Why Sending Every Query Into Retrieval Breaks RAG
- For the system-design framing of the whole pipeline, read RAG System Design Interview Guide
Interview AiBoxInterview AiBox — Interview Copilot
Beyond Prep — Real-Time Interview Support
Interview AiBox provides real-time on-screen hints, AI mock interviews, and smart debriefs — so every answer lands with confidence.
AI Reading Assistant
Send to your preferred AI
Smart Summary
Deep Analysis
Key Topics
Insights
Share this article
Copy the link or share to social platforms