Many candidates sound confident when they explain RAG until they say one sentence that feels harmless:

The user asks a question, we retrieve relevant documents, then send the context to the LLM.

That answer is not fully wrong. It is just too shallow to survive follow-up pressure.

What if the user asks:

"Can you calculate the payout for plan A?"
"What happened to the case we discussed yesterday?"
"What changed in the newest version of the claims process?"

Should all of those go through the same retrieval path?

Clearly not.

That is why this article focuses on one module many people skip:

query understanding and routing.

Its job is not only to make retrieval smarter. Its job is to decide:

what kind of task this query actually is
whether retrieval is the right path at all
which index or source pool to search
whether time, source, role, or project constraints must be extracted first
how the system should fall back when the first route fails

For Interview AiBox, this matters even more because live interview questions mix project follow-ups, behavioral prompts, system-design prompts, calculations, and structured lookups in the same session.

Start with the main chain: route before you retrieve

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#DBEAFE', 'primaryTextColor': '#0F172A', 'primaryBorderColor': '#3B82F6', 'lineColor': '#2563EB', 'secondaryColor': '#E0F2FE', 'tertiaryColor': '#F8FAFC'}}}%% flowchart LR A["User Query"] --> B["Query Understanding"] B --> C["Intent Classification"] B --> D["Entity Extraction"] C --> E{"Routing Decision"} D --> E E -->|"Knowledge QA"| F["Hybrid Retrieval + Rerank"] E -->|"Calculation"| G["Calculator or Tool Path"] E -->|"Structured Lookup"| H["Database or API Query"] E -->|"Uncertain"| I["Conservative Default Retrieval"] F --> J["Context Assembly"] G --> J H --> J I --> J J --> K["Answer Generation"]

Why a single retrieval path fails in real systems

This question shows up so often because many candidates still think of RAG as one universal entry point.

Real product queries are not that uniform.

The same chat box may receive:

fact queries asking for definitions, policies, or workflows
calculation queries asking for payouts, totals, or formulas
structured lookups asking for a specific case, record, or metric
filtered queries asking about "yesterday," "latest," or "from source X"
out-of-scope prompts that may need refusal or redirection

If all of them go straight into retrieval, you get two embarrassing outcomes:

1. Things that should be calculated get "explained"

The user wants a number. The system returns policy text.

2. Things that should be filtered get mixed together

The user wants the latest version. Retrieval returns old and new material together, and the final answer sounds smooth but is outdated.

That is why stronger candidates do not describe the chain as:

query, embedding, vector search, answer

They say something closer to:

we first decide what kind of task the query represents, then decide whether retrieval is appropriate, and only then choose the retrieval path and constraints.

Intent classification is really a scheduling layer

When interviewers ask about query understanding, they often want to know whether you see the system as a multi-capability entry point instead of a retrieval-only pipeline.

1. Rules are great for obvious high-frequency patterns

Some expressions carry very strong route signals:

"calculate"
"how much"
"average"
"ratio"
"latest"
"yesterday"

Rules work well here because they are:

fast
explainable
cheap

They are a strong first layer for the obvious cases.

2. Lightweight classifiers improve robustness

Users do not always phrase things the way your rules expect.

Someone may ask "How much would this payout roughly be?" without explicitly saying "calculate," but the task is still numerical.

That is where a lightweight classifier helps. It can judge based on semantics rather than exact keywords.

3. LLMs are better as escalation, not the default

Many teams route every query through an LLM classifier because it is convenient in demos.

In production, a more practical stack often looks like this:

rules first
classifier second
LLM only for low-confidence cases

That gives you a better balance of:

latency
cost
stability

This matters a lot in Interview AiBox because live interviews are very sensitive to extra delay.

Entity extraction is what makes the constraints real

Intent classification answers "which path?" but not "under what constraints?"

Many user queries contain hidden parameters that completely change the correct result.

For example:

"What ranking did that chemistry-sector story from yesterday have in Exclusive News?"

This query contains at least:

a time constraint: yesterday
a source constraint: Exclusive News
a topic constraint: chemistry sector
a target field: ranking

If you do not extract those elements and simply search the full sentence, retrieval may find generic chemistry-sector content instead of the specific record the user meant.

Common entities worth extracting include:

time
source
project name
company name
tech stack
numeric parameters
role or job direction

Once extracted, those fields can drive filtering, route choice, and context assembly.

Good routing is less about being clever and more about avoiding the wrong path

People often overcomplicate routing in interviews.

From an engineering point of view, the main principle is simple:

send the query to the path least likely to produce the wrong kind of answer.

A stable routing strategy often looks like this:

1. Knowledge questions go to retrieval

This covers:

process explanations
policy lookup
project background
technical decision explanations

Here the system enters the familiar RAG path: hybrid retrieval, rerank, context assembly, generation.

2. Calculation questions go to tools or calculators

This covers:

amount calculation
quota or capacity calculation
simple formula evaluation

If you force these through retrieval, you get explanation text instead of the answer.

3. Structured lookups go to databases or APIs

This covers:

current case status
time-window metrics
one specific record or event

The biggest risk here is not ignorance. It is fabrication that sounds plausible.

4. When uncertain, default conservatively

This is one of the most useful engineering instincts.

If the classifier is not confident, do not aggressively route to a specialized path that may be wrong. Prefer conservative default retrieval, or run multiple safe paths in parallel.

Wrong routing is often more damaging than a slightly slower but still acceptable default path.

Why query routing matters even more in interview products

Generic knowledge assistants mostly answer questions.

Interview products must also decide what kind of answer should be assembled right now.

For example:

Is this a project follow-up or a behavioral prompt?
Does the user need factual evidence or speaking structure?
Should the system pull project metrics, STAR material, or system-design trade-offs?
Is the goal to explain the architecture or to lead with the result?

That is why in Interview AiBox we treat query understanding as a scene-identification layer, not just a query parser.

For example:

if the query looks like a project follow-up, we prioritize project facts, metrics, ownership, and trade-offs
if it looks like a behavioral question, we prioritize STAR material, collaboration episodes, and influence signals
if it looks like system design, we prioritize architecture frames, scaling logic, and trade-off templates

So interview-focused RAG is not only about retrieving better. It is about understanding what the candidate needs to say next.

Fallback design often matters more than headline accuracy

Many candidates spend too much time talking about intent accuracy and not enough time talking about fallback behavior.

You will never route every query perfectly the first time.

The more mature question is:

what happens when the first routing guess is not reliable?

Strong systems usually prepare several fallback layers:

1. Low-confidence fallback to default retrieval

If intent is unclear, at least preserve a safe and usable answer path.

2. Multi-path execution for important cases

For higher-value questions, you can run:

retrieval on the original query
retrieval on a rewritten query
retrieval with extracted metadata constraints

Then merge the candidate set before reranking.

3. Tool failure fallback

If the calculator, SQL path, or API tool fails, the system should degrade gracefully instead of returning a raw error or fabricated number.

These decisions do not make demos look flashy. They make production systems survive messy inputs.

How to explain this in interviews without sounding theoretical

If the interviewer asks:

"How do you handle different query types in your RAG system?"

A strong answer can follow this order:

1. Start with why one path is not enough

Fact questions should retrieve. Numerical questions should calculate. Structured lookups should query structured systems. Time- and source-constrained questions should extract entities before retrieval.

2. Explain the layered classifier

Rules handle obvious high-frequency cases, a lightweight classifier improves robustness, and the LLM only helps on low-confidence edge cases.

3. Explain why entity extraction matters

Time, source, project name, and role constraints are not just metadata. They directly affect routing, filtering, and final context choice.

4. End with fallback design

When confidence is low, default conservatively. When tools fail, degrade safely. That prevents one bad route decision from collapsing the whole answer.

If you add one more sentence like this:

In Interview AiBox, we also identify whether the user is in a project follow-up, behavioral, or system-design moment, because that changes what evidence and phrasing support we retrieve.

your answer immediately sounds more like product work and less like a generic RAG tutorial.