Preamble#
Query understanding constitutes the single most consequential stage in any agentic retrieval pipeline. Every downstream operation—retrieval precision, tool selection, memory access, response synthesis, and verification—is bounded by the fidelity with which the system interprets the user's communicative intent, decomposes it into actionable sub-problems, enriches it with contextual and ontological knowledge, and routes constituent sub-queries to the correct execution tier. A system that retrieves against raw query strings treats language as a lookup key. A system that understands queries treats language as a compressed program requiring decompilation before execution.
This chapter formalizes query understanding as a multi-stage cognitive pipeline: a typed, inspectable, and measurable transformation from an ambiguous natural-language surface form into a structured, provenance-tagged, schema-routed execution plan. We develop the mathematical foundations, taxonomic models, decomposition strategies, enrichment protocols, and quality metrics that elevate query processing from heuristic pattern matching to a principled engineering discipline.
7.1 Query Understanding as a Cognitive Pipeline, Not String Matching#
7.1.1 The Fundamental Inadequacy of String-Level Processing#
Traditional information retrieval systems operate on a reductive assumption: the query string is a sufficient representation of the user's information need . This assumption fails categorically in agentic contexts for three structural reasons:
-
Compression Loss. Natural language is a lossy compression of intent. The query "How does the new policy affect our deployment?" encodes domain context (which policy?), temporal reference (new relative to what?), scope (which deployment?), and an implicit causal reasoning requirement—none of which are recoverable from lexical or even embedding-level similarity alone.
-
Contextual Dependency. In multi-turn interactions, queries carry anaphoric references, elliptical constructions, and presuppositions that require resolution against conversational history, user state, and session memory.
-
Heterogeneous Execution Requirements. A single surface query may require simultaneous retrieval from structured databases, unstructured document stores, code repositories, live APIs, and episodic memory—each with distinct schemas, latency profiles, and authority levels.
7.1.2 The Pipeline Abstraction#
We define query understanding as a typed transformation pipeline that maps a raw input tuple to an executable query plan:
where:
- is the raw query string (or multi-modal input).
- is the conversational history (query-response-timestamp triples).
- is the accessible memory state (working, session, episodic, semantic, procedural layers).
- is the system state (available tools, active schemas, user profile, authorization scope).
- is a structured query execution plan containing resolved intents, decomposed sub-queries, enrichment annotations, routing directives, and verification predicates.
7.1.3 Pipeline Stages#
The pipeline comprises the following ordered stages, each producing a typed intermediate representation:
| Stage | Input | Output | Section |
|---|---|---|---|
| Intent Classification | Intent vector with confidence | §7.2 | |
| Pragmatic Analysis | Pragmatic frame | §7.3 | |
| Cognitive Load Estimation | Complexity score , reasoning mode | §7.4 | |
| Query Rewriting & Expansion | Rewritten query set | §7.5 | |
| Decomposition | Sub-query DAG | §7.6 | |
| Schema-Aware Routing | Routed execution plan | §7.7 | |
| Clarification Detection | All prior outputs | Clarification request or proceed signal | §7.9 |
7.1.4 Formal Invariant#
The pipeline must satisfy the semantic preservation invariant: the union of information needs addressed by the decomposed, routed sub-queries must be a superset of the original intent:
Any pipeline stage that reduces coverage below this bound constitutes a comprehension fault and must trigger either clarification (§7.9) or repair (via the bounded agent loop's critique-repair cycle).
7.1.5 Design Principles#
- Typed Intermediates. Every stage produces a structured, schema-validated output—never free-text annotations.
- Provenance Tracking. Every enrichment, expansion, or rewrite carries a provenance tag identifying its source (ontology, memory, model inference, user history).
- Token Budget Awareness. The pipeline operates under a global token budget ; enrichment is bounded by retrieval utility, not unbounded expansion.
- Fail-Safe Semantics. If any stage fails or times out, the pipeline degrades gracefully to the best available intermediate representation rather than propagating raw .
7.2 Intent Classification: Taxonomic, Hierarchical, and Open-Domain Intent Models#
7.2.1 Intent as a Structured Object#
Intent classification transforms the surface query into a structured intent representation. We define an intent as a typed record:
where:
- is the intent type drawn from a taxonomy .
- is the classification confidence.
- is the intent class.
- is a vector of extracted parameters (entities, constraints, temporal references, scope qualifiers).
- is the derivation mode indicating how the intent was determined.
7.2.2 Taxonomic Intent Models#
A flat taxonomy defines as a finite set of mutually exclusive intent labels. This is sufficient for narrow-domain systems but fails when intents are compositional.
Classification via softmax over taxonomy:
where is an encoder producing a -dimensional query embedding.
Limitation: Flat taxonomies cannot represent composite intents such as "compare X and Y, then recommend based on our deployment constraints," which simultaneously invokes comparison, recommendation, and constraint-satisfaction intents.
7.2.3 Hierarchical Intent Models#
A hierarchical taxonomy organizes intents as a tree or DAG where parent nodes represent coarse intent categories and leaf nodes represent fine-grained intents.
Hierarchical classification proceeds top-down:
where is the root and .
Advantages:
- Enables coarse-to-fine resolution under latency constraints (early exit at intermediate levels).
- Permits structural sharing of parameters across related intents.
- Supports graceful degradation: if fine-grained classification is uncertain, the system operates at the parent level.
7.2.4 Open-Domain Intent Models#
In agentic systems interacting with arbitrary domains, a closed taxonomy is insufficient. Open-domain intent models must handle previously unseen intent types.
Approach 1: Intent as Natural-Language Description. Rather than classifying into a fixed set, the system generates a structured natural-language intent description:
Approach 2: Intent Embedding in Continuous Space. Map intents to a continuous embedding space where similarity corresponds to functional equivalence:
New intents are recognized by their distance from known intent cluster centroids. If , the intent is flagged as novel, triggering clarification or dynamic taxonomy extension.
7.2.5 Multi-Intent Detection#
Agentic queries frequently encode multiple simultaneous intents. We model multi-intent detection as a multi-label classification problem:
where is the element-wise sigmoid function. An intent is active if , where is a per-intent calibrated threshold.
Multi-intent queries require decomposition (§7.6) to ensure each intent is served by an appropriate retrieval pathway.
7.2.6 Pseudo-Algorithm: Hierarchical Multi-Intent Classification#
ALGORITHM 7.1: HierarchicalMultiIntentClassify(q, H, T_H)
───────────────────────────────────────────────────────────
Input:
q — raw query string
H — conversational history
T_H — hierarchical intent taxonomy (V, E, depth L)
Output:
I_set — set of (intent, confidence, derivation_mode) triples
1. q_ctx ← ContextualizeQuery(q, H) // resolve anaphora, ellipsis
2. v_q ← Encode(q_ctx) // dense embedding ∈ ℝ^d
3. I_set ← ∅
4. frontier ← {root(T_H)}
5. WHILE frontier ≠ ∅ DO
6. node ← Pop(frontier)
7. children ← Children(node, T_H)
8. IF children = ∅ THEN // leaf node
9. score ← IntentScore(v_q, node)
10. IF score > θ_leaf THEN
11. derivation ← DetermineDerivation(q, node)
12. I_set ← I_set ∪ {(node.intent, score, derivation)}
13. END IF
14. ELSE
15. FOR EACH child IN children DO
16. score ← IntentScore(v_q, child)
17. IF score > θ_branch THEN
18. frontier ← frontier ∪ {child}
19. END IF
20. END FOR
21. END IF
22. END WHILE
23. IF |I_set| = 0 THEN // open-domain fallback
24. I_open ← GenerateOpenIntent(q_ctx, H)
25. I_set ← {(I_open, confidence(I_open), "inferred")}
26. END IF
27. I_set ← DeduplicateAndMerge(I_set)
28. RETURN I_set7.2.7 Intent Confidence Calibration#
Raw model scores are typically poorly calibrated. We apply temperature scaling post-hoc:
where is a scalar temperature learned on a held-out calibration set by minimizing negative log-likelihood. A well-calibrated intent classifier satisfies:
This property is critical for downstream routing: if the system cannot reliably distinguish between intents, it must request clarification rather than committing to a low-confidence route.
7.3 Psycholinguistic Analysis: Pragmatic Inference, Gricean Maxims, Presupposition Resolution#
7.3.1 Beyond Semantics: The Pragmatic Layer#
Semantic analysis extracts what the words mean. Pragmatic analysis extracts what the speaker means by using those words in this context. This distinction is not philosophical decoration—it is the difference between retrieving a definition of "scalability" and understanding that the user is asking how to resolve a performance bottleneck they are currently experiencing.
We formalize the pragmatic frame as:
where:
- is the identified speech act (request, assertion, question, directive, commissive).
- is the set of presuppositions the query takes for granted.
- is the set of conversational implicatures derivable via pragmatic reasoning.
- is the inferred response expectation (format, depth, evidence requirements).
7.3.2 Gricean Maxim Analysis#
H.P. Grice's cooperative principle posits that speakers implicitly follow four maxims. When a query appears to violate a maxim, the system should infer an implicature:
| Maxim | Definition | Violation Signal | Implicature Example |
|---|---|---|---|
| Quantity | Be as informative as required, no more | Over-specification or under-specification | Under-specified query → user assumes shared context |
| Quality | Do not say what you believe to be false | Hedging, uncertainty markers | "I think the API changed?" → request for confirmation |
| Relation | Be relevant | Apparent topic shift | Topic shift → implicit connection the system must discover |
| Manner | Be clear, orderly | Vague, ambiguous phrasing | Deliberate vagueness → user is exploring, not retrieving |
Formal implicature extraction:
Given a query and maxim violation detector :
7.3.3 Presupposition Resolution#
Presuppositions are propositions that the query takes as given. Unresolved presuppositions cause retrieval against false premises.
Example: "Why did the migration fail?" presupposes (a) a migration occurred, (b) it failed.
Presupposition extraction and validation:
Each must be verified against available evidence:
Contradicted presuppositions must trigger either:
- A corrective clarification to the user ("The migration actually succeeded; are you asking about the subsequent deployment failure?").
- A presupposition repair in the query plan (replace failed presupposition with corrected premise before retrieval).
7.3.4 Speech Act Classification#
We classify speech acts using an adaptation of Searle's taxonomy:
The speech act determines the functional shape of the expected response:
| Speech Act | System Response Shape |
|---|---|
| Assertive question | Evidence-backed factual answer |
| Directive | Action execution + confirmation |
| Commissive | Commitment tracking + follow-up scheduling |
| Expressive | Acknowledgment + contextual assistance |
| Declarative | State change + verification |
7.3.5 Pseudo-Algorithm: Pragmatic Frame Construction#
ALGORITHM 7.2: ConstructPragmaticFrame(q, I_set, H, M)
───────────────────────────────────────────────────────
Input:
q — raw query string
I_set — classified intents from Algorithm 7.1
H — conversational history
M — memory state (session + episodic)
Output:
F_p — pragmatic frame (speech_act, presuppositions, implicatures, expectations)
1. // Speech act classification
2. a_speech ← ClassifySpeechAct(q, H)
3. // Presupposition extraction and validation
4. P_raw ← ExtractPresuppositions(q, H)
5. P_validated ← ∅
6. FOR EACH p IN P_raw DO
7. evidence ← RetrieveEvidence(p, M, timeout=50ms)
8. status ← Verify(p, evidence)
9. P_validated ← P_validated ∪ {(p, status, evidence.provenance)}
10. END FOR
11. // Gricean maxim violation detection
12. G_imp ← ∅
13. FOR EACH maxim IN {Quantity, Quality, Relation, Manner} DO
14. IF DetectViolation(q, maxim, H) THEN
15. g ← DeriveImplicature(q, maxim, H, I_set)
16. G_imp ← G_imp ∪ {(g, maxim, confidence(g))}
17. END IF
18. END FOR
19. // Response expectation inference
20. E_exp ← InferExpectation(a_speech, I_set, H, UserProfile(M))
21. // E_exp includes: format, depth, evidence_required, action_required
22. // Check for contradicted presuppositions
23. contradictions ← {(p, s, e) ∈ P_validated : s = "contradicted"}
24. IF |contradictions| > 0 THEN
25. F_p.requires_clarification ← TRUE
26. F_p.contradiction_details ← contradictions
27. END IF
28. F_p ← (a_speech, P_validated, G_imp, E_exp)
29. RETURN F_p7.3.6 Pragmatic Inference Under Uncertainty#
When pragmatic inference is uncertain (e.g., the system cannot determine whether "Can you show me the logs?" is a capability question or a directive), we maintain a pragmatic distribution:
If the entropy of this distribution exceeds a threshold:
the system must either (a) hedge its response to cover the top- interpretations or (b) trigger active clarification (§7.9).
7.4 Cognitive Load Modeling: Estimating Task Complexity, Ambiguity, and Required Reasoning Depth#
7.4.1 Motivation#
Not all queries require the same computational investment. A simple factual lookup should not trigger multi-step decomposition and parallel retrieval fan-out. Conversely, a complex analytical query should not be answered with a single retrieval pass. Cognitive load modeling estimates the computational and reasoning resources required to adequately serve a query, enabling adaptive pipeline configuration.
7.4.2 Complexity Dimensions#
We model cognitive load as a multi-dimensional score:
| Dimension | Measures | Range |
|---|---|---|
| Lexical complexity (vocabulary rarity, technical density) | ||
| Semantic ambiguity (polysemy, vagueness) | ||
| Structural complexity (syntactic depth, embedded clauses) | ||
| Reasoning depth (number of inference steps required) | ||
| Information scope (number of distinct knowledge domains) |
7.4.3 Aggregate Complexity Score#
The aggregate cognitive load is computed as a weighted combination:
subject to and .
Weights are calibrated empirically against human difficulty ratings on a held-out query set.
7.4.4 Reasoning Mode Selection#
Based on the cognitive load profile, the system selects an appropriate reasoning mode :
Each mode configures the downstream pipeline differently:
| Mode | Decomposition | Retrieval | Verification | Token Budget |
|---|---|---|---|---|
| Direct-retrieval | None | Single-pass | Confidence check | Minimal |
| Single-step-reasoning | None | Multi-source | Evidence match | Moderate |
| Multi-step-decomposition | DAG (§7.6) | Per-sub-query | Per-sub-result | Standard |
| Deliberative-analysis | Full DAG + critique | Iterative | Multi-round verification | Extended |
7.4.5 Ambiguity Quantification#
Semantic ambiguity is estimated via the entropy of the embedding neighborhood:
where are the cosine similarities between 's embedding and its -nearest neighbors in the retrieval index, and are the normalized similarities. High entropy indicates the query is equidistant from many semantically distinct documents—a signal of ambiguity.
7.4.6 Reasoning Depth Estimation#
We estimate by analyzing the logical structure of the query:
where , , , are counts of distinct entities, required relational inferences, conditional branches, and comparative evaluations extracted from the query, and are empirically calibrated coefficients.
7.4.7 Pseudo-Algorithm: Cognitive Load Estimation#
ALGORITHM 7.3: EstimateCognitiveLoad(q, F_p, I_set, index)
───────────────────────────────────────────────────────────
Input:
q — contextualized query
F_p — pragmatic frame
I_set — classified intents
index — retrieval index for ambiguity estimation
Output:
κ — (κ_lex, κ_sem, κ_struct, κ_reason, κ_scope)
κ_agg — aggregate score
ρ — selected reasoning mode
1. // Lexical complexity
2. tokens ← Tokenize(q)
3. κ_lex ← Mean({Rarity(t) : t ∈ tokens}) + TechnicalDensity(tokens)
4. κ_lex ← Clamp(κ_lex, 0, 1)
5. // Semantic ambiguity via embedding neighborhood entropy
6. v_q ← Encode(q)
7. neighbors ← KNN(index, v_q, k=20)
8. sims ← CosineSimilarities(v_q, neighbors)
9. sims_norm ← Normalize(sims)
10. κ_sem ← -Sum(sims_norm * Log(sims_norm)) / Log(k)
11. // Structural complexity via parse depth
12. tree ← SyntacticParse(q)
13. κ_struct ← Depth(tree) / D_max_struct + EmbeddedClauseCount(tree) / C_max
14. // Reasoning depth
15. entities ← ExtractEntities(q, F_p)
16. relations ← ExtractRelations(q, F_p)
17. conditionals ← CountConditionals(q)
18. comparisons ← CountComparisons(q)
19. κ_reason ← (|entities|·α + |relations|·β + conditionals·γ + comparisons·δ) / D_max
20. // Information scope
21. domains ← IdentifyDomains(q, I_set)
22. κ_scope ← |domains| / max_domains
23. // Aggregate
24. κ_agg ← w_lex·κ_lex + w_sem·κ_sem + w_struct·κ_struct + w_reason·κ_reason + w_scope·κ_scope
25. // Mode selection
26. IF κ_agg < θ₁ THEN ρ ← "direct-retrieval"
27. ELIF κ_agg < θ₂ THEN ρ ← "single-step-reasoning"
28. ELIF κ_agg < θ₃ THEN ρ ← "multi-step-decomposition"
29. ELSE ρ ← "deliberative-analysis"
30. RETURN (κ_lex, κ_sem, κ_struct, κ_reason, κ_scope), κ_agg, ρ7.5 Query Rewriting and Expansion#
Query rewriting transforms the raw query into one or more semantically enriched variants that improve retrieval recall and precision. This is not cosmetic reformulation—it is a critical recall amplification mechanism that bridges the vocabulary and conceptual gap between user language and corpus language.
7.5.1 Hypothetical Document Embedding (HyDE) Generation#
7.5.1.1 Conceptual Foundation#
HyDE inverts the retrieval problem: instead of matching the query against documents, the system generates a hypothetical document that would ideally answer the query, then retrieves real documents similar to this hypothetical answer.
Formal definition:
7.5.1.2 Why HyDE Works#
The embedding of a well-formed answer paragraph occupies a different (often more precise) region of embedding space than a short question. HyDE exploits this observation:
under the assumption that the hypothetical answer shares more lexical and structural features with real answers than the original question does.
7.5.1.3 Multi-HyDE for Ambiguous Queries#
For queries with high semantic ambiguity (), generate multiple hypothetical documents to cover distinct interpretations:
where is an elevated temperature parameter encouraging diversity. The final retrieval set is the union of results from all hypothetical embeddings:
7.5.1.4 HyDE Quality Control#
A hypothetical document may hallucinate facts. This is acceptable because HyDE uses the document only as an embedding probe, not as an answer. However, a wildly off-topic hypothesis degrades retrieval. We apply a relevance gate:
7.5.1.5 Pseudo-Algorithm: HyDE Generation#
ALGORITHM 7.4: HyDEGenerate(q, F_p, κ_sem)
────────────────────────────────────────────
Input:
q — contextualized query
F_p — pragmatic frame
κ_sem — semantic ambiguity score
Output:
V_hyde — set of hypothetical document embeddings with provenance
1. num_hypotheses ← IF κ_sem > θ_ambig THEN m_max ELSE 1
2. temperature ← IF num_hypotheses > 1 THEN T_diverse ELSE T_standard
3. V_hyde ← ∅
4. FOR j = 1 TO num_hypotheses DO
5. prompt ← CompileHyDEPrompt(q, F_p, j)
6. // "Write a detailed passage that answers the following question: {q}"
7. d_hyp_j ← LLM.Generate(prompt, temperature=temperature, max_tokens=256)
8. v_hyp_j ← Encode(d_hyp_j)
9.
10. // Relevance gate
11. IF CosineSim(Encode(q), v_hyp_j) > θ_HyDE_min THEN
12. V_hyde ← V_hyde ∪ {(v_hyp_j, provenance="HyDE", source_query=q)}
13. ELSE
14. Log("HyDE hypothesis rejected: low relevance", j)
15. END IF
16. END FOR
17. IF |V_hyde| = 0 THEN
18. V_hyde ← {(Encode(q), provenance="fallback_raw", source_query=q)}
19. END IF
20. RETURN V_hyde7.5.2 Synonym Expansion, Ontological Enrichment, and Domain Terminology Mapping#
7.5.2.1 Synonym Expansion#
Synonym expansion augments the query with lexical variants to improve recall against documents using different terminology for the same concept.
Controlled expansion via ontology lookup:
where is a domain-scoped synonym ontology. The domain constraint prevents incorrect expansions (e.g., "Java" → "coffee" in a software engineering context).
7.5.2.2 Ontological Enrichment#
Beyond synonyms, ontological enrichment adds hypernyms (generalization), hyponyms (specialization), meronyms (part-of), and related concepts from a domain knowledge graph :
where is the relation type and is a weight reflecting the utility of the relation for retrieval enrichment.
Enrichment budget: To prevent query explosion, limit the total enrichment to additional terms:
selecting the top- enrichments by weight.
7.5.2.3 Domain Terminology Mapping#
In enterprise contexts, users and documents may use different terminologies for identical concepts. A terminology mapping layer maintains bidirectional mappings:
These mappings are populated from:
- Organizational glossaries.
- Automatically mined term co-occurrence patterns.
- Human-curated correction memories (§7.11, Theory of Mind).
7.5.2.4 Weighted Query Formulation#
After expansion and enrichment, the rewritten query is a weighted bag of terms:
where original terms receive weight , synonyms receive , and ontological enrichments receive , calibrated to prevent enrichment terms from dominating retrieval scoring.
7.5.3 Ellipsis Resolution and Anaphora Tracking in Multi-Turn Queries#
7.5.3.1 The Multi-Turn Problem#
In multi-turn interactions, queries are frequently incomplete:
- Anaphora: "What about its latency?" — "its" refers to a system mentioned three turns ago.
- Ellipsis: "And for production?" — omits the entire predicate; the full query is "What is the deployment configuration for production?"
- Deictic reference: "Show me that error" — "that" refers to an error displayed in the UI or mentioned in a prior response.
7.5.3.2 Formal Resolution#
We define resolution as a function that produces a self-contained query from a context-dependent one:
Resolution via co-reference chain construction:
Let be the set of co-reference chains extracted from . For each unresolved reference in :
where is the set of candidate antecedents for reference , and the scoring function balances recency (more recent mentions preferred), salience (topically central entities preferred), and semantic fit (the antecedent must be semantically compatible with the query's predicate structure).
7.5.3.3 Pseudo-Algorithm: Multi-Turn Query Resolution#
ALGORITHM 7.5: ResolveMultiTurnQuery(q_raw, H, M_session)
──────────────────────────────────────────────────────────
Input:
q_raw — current turn query (potentially incomplete)
H — conversational history [(q₁,r₁,t₁), ..., (qₙ,rₙ,tₙ)]
M_session — session memory (entities, topics, focal objects)
Output:
q_resolved — fully self-contained query string
references — list of resolved (reference, antecedent, confidence) triples
1. // Detect unresolved references
2. refs ← DetectAnaphoraAndEllipsis(q_raw)
3. IF refs = ∅ AND NOT IsElliptical(q_raw) THEN
4. RETURN q_raw, []
5. END IF
6. // Build entity salience model from history
7. entity_scores ← {}
8. FOR i = |H| DOWNTO max(1, |H| - window_size) DO
9. entities_i ← ExtractEntities(H[i].query ⊕ H[i].response)
10. FOR EACH e IN entities_i DO
11. recency ← DecayFunction(|H| - i)
12. salience ← MentionCount(e, H) * TopicalCentrality(e, H)
13. entity_scores[e] ← λ_rec · recency + λ_sal · salience
14. END FOR
15. END FOR
16. // Resolve each reference
17. references ← []
18. q_resolved ← q_raw
19. FOR EACH ref IN refs DO
20. candidates ← FilterByType(entity_scores, ref.expected_type)
21. FOR EACH (e, score) IN candidates DO
22. score ← score + λ_sem · SemanticFit(e, q_raw, ref.position)
23. END FOR
24. best ← ArgMax(candidates, by=score)
25. confidence ← Softmax(candidates.scores)[best]
26. IF confidence > θ_resolve THEN
27. q_resolved ← Substitute(q_resolved, ref, best.entity)
28. references ← references ∪ {(ref, best.entity, confidence)}
29. ELSE
30. // Cannot resolve confidently; mark for clarification
31. q_resolved ← MarkAmbiguous(q_resolved, ref)
32. references ← references ∪ {(ref, NULL, confidence)}
33. END IF
34. END FOR
35. // Handle ellipsis: reconstruct omitted predicate
36. IF IsElliptical(q_raw) THEN
37. template ← FindMostRecentPredicateTemplate(H)
38. q_resolved ← MergeEllipticalQuery(q_resolved, template)
39. END IF
40. RETURN q_resolved, references7.6 Query Decomposition Strategies#
7.6.1 Decomposition as Structural Transformation#
Complex queries encode multiple information needs with varying dependency structures. Decomposition transforms a monolithic query into a sub-query graph where:
- Each vertex is a self-contained sub-query with its own intent, scope, and routing target.
- Each edge represents a data dependency (the result of is required as input to ).
The topology of determines the decomposition strategy.
7.6.1 Parallel-Decomposition: Independent Sub-Queries for Fan-Out Retrieval#
7.6.1.1 Definition#
Parallel decomposition applies when the original query comprises multiple independent information needs:
Example: "What are the current CPU metrics for the staging cluster, and what does the latest RFC say about the new auth protocol?"
This decomposes into two independent sub-queries:
- Retrieve CPU metrics → routed to metrics API.
- Retrieve RFC content → routed to document store.
7.6.1.2 Execution Model#
All sub-queries execute concurrently with independent deadlines:
This is optimal when sub-queries target different sources with independent latency profiles.
7.6.1.3 Independence Verification#
Before committing to parallel execution, verify independence:
If entities or predicates overlap, the sub-queries may have hidden dependencies requiring sequential or conditional treatment.
7.6.2 Sequential-Decomposition: Dependency-Ordered Sub-Query Chains#
7.6.2.1 Definition#
Sequential decomposition applies when sub-queries form a chain of dependencies:
Example: "Find the service that had the highest error rate last week, then retrieve its deployment history, and identify which config change caused the regression."
This decomposes into:
- : Identify highest-error-rate service → metrics query.
- : Retrieve deployment history for → ops database.
- : Identify causal config change from → analytical reasoning.
7.6.2.2 Execution Model#
Each sub-query receives the result of its predecessor as context:
7.6.2.3 Failure Propagation#
In sequential chains, failure at step prevents all downstream steps. The system must implement:
- Partial result reporting: Return results from with an explicit failure annotation at .
- Retry with relaxed constraints: Attempt with broader retrieval parameters.
- User escalation: If fails after retry, request human guidance before proceeding.
7.6.3 Conditional-Decomposition: Branch-on-Evidence Sub-Query Trees#
7.6.3.1 Definition#
Conditional decomposition applies when the next sub-query depends on the content of a prior result:
where assigns a Boolean predicate to each edge. An edge is traversed only if .
Example: "Check if the service is using the legacy auth module. If so, find the migration guide. If not, verify it's using the new auth SDK and check for known issues."
This decomposes into a tree:
v₁: Check auth module type
├── [legacy=true] → v₂: Find migration guide
└── [legacy=false] → v₃: Verify new auth SDK
└── v₄: Check known issues7.6.3.2 Execution Model#
Conditional decomposition requires an evaluation step between sub-queries:
The total latency is:
where is the time to evaluate the branching predicate.
7.6.3.3 Speculative Execution#
To reduce latency, the system may speculatively execute both branches of a conditional node in parallel and discard the unused branch's results:
This trades compute cost for latency reduction. The decision to speculate is governed by:
The second condition ensures speculation is worthwhile only when the branch outcome is genuinely uncertain.
7.6.4 Unified Decomposition Framework#
ALGORITHM 7.6: DecomposeQuery(q, I_set, F_p, κ, ρ, S)
──────────────────────────────────────────────────────
Input:
q — resolved, enriched query
I_set — classified intents
F_p — pragmatic frame
κ — cognitive load vector
ρ — reasoning mode
S — system state (available tools, schemas)
Output:
G_sq — sub-query DAG (vertices, edges, predicates, routing hints)
1. IF ρ = "direct-retrieval" THEN
2. // No decomposition needed
3. G_sq ← SingleNode(q, I_set[0], routing=DefaultRoute(q, S))
4. RETURN G_sq
5. END IF
6. // Extract decomposition candidates
7. intents ← I_set
8. entities ← ExtractEntities(q)
9. relations ← ExtractRelations(q, entities)
10. conditions ← ExtractConditionals(q)
11. // Determine decomposition topology
12. IF |intents| > 1 AND AllIndependent(intents, entities) THEN
13. strategy ← "parallel"
14. ELIF |conditions| > 0 THEN
15. strategy ← "conditional"
16. ELIF HasChainedDependencies(relations) THEN
17. strategy ← "sequential"
18. ELSE
19. strategy ← "parallel" // default: independent sub-queries
20. END IF
21. // Generate sub-queries
22. SWITCH strategy:
23. CASE "parallel":
24. sub_queries ← GenerateParallelSubQueries(q, intents, entities)
25. G_sq ← DAG(nodes=sub_queries, edges=∅)
26.
27. CASE "sequential":
28. chain ← OrderByDependency(intents, relations)
29. sub_queries ← GenerateSequentialSubQueries(q, chain)
30. edges ← {(sq_i, sq_{i+1}) : i = 1..|sub_queries|-1}
31. G_sq ← DAG(nodes=sub_queries, edges=edges)
32.
33. CASE "conditional":
34. tree ← BuildConditionalTree(q, intents, conditions, entities)
35. sub_queries ← tree.nodes
36. edges ← tree.edges // includes predicate functions
37. G_sq ← DAG(nodes=sub_queries, edges=edges, predicates=tree.predicates)
38. // Annotate each sub-query with routing hints
39. FOR EACH sq IN G_sq.nodes DO
40. sq.routing ← InferRoutingHints(sq, S)
41. sq.deadline ← AssignDeadline(sq, ρ, strategy)
42. sq.verification ← SelectVerificationStrategy(sq, κ)
43. END FOR
44. // Validate: semantic preservation invariant
45. coverage ← ComputeCoverage(G_sq, q, I_set)
46. IF coverage < θ_coverage THEN
47. missing ← IdentifyMissingIntents(G_sq, I_set)
48. G_sq ← AddSubQueries(G_sq, missing)
49. END IF
50. RETURN G_sq7.6.5 Decomposition Depth Bounding#
Unbounded decomposition leads to exponential sub-query proliferation. We enforce:
where is the maximum number of sub-queries (typically 8–12 for production systems) and is the maximum sequential depth (typically 4–5). These bounds are informed by the token budget:
where is the token budget reserved for final answer synthesis.
7.7 Schema-Aware Query Routing: Matching Sub-Queries to Source Type, Latency Tier, and Authority Level#
7.7.1 The Routing Problem#
After decomposition, each sub-query must be directed to the most appropriate data source. This is a multi-objective assignment problem that must optimize for:
- Schema compatibility: Can the source answer this type of question?
- Authority level: How trustworthy is this source for this domain?
- Latency tier: Can the source respond within the sub-query's deadline?
- Cost: What is the computational/monetary cost of querying this source?
- Freshness: Does the source have sufficiently recent data?
7.7.2 Source Registry#
The system maintains a typed source registry :
| Symbol | Meaning | Type |
|---|---|---|
| Source identifier | String | |
| Schema descriptor (query types supported, entity domains) | Structured | |
| Authority score | ||
| Latency profile (P50, P90, P99) | Distribution | |
| Cost per query | ||
| Freshness (data staleness bound) | Duration |
7.7.3 Routing Score Function#
For each sub-query and candidate source , compute a routing score:
where:
7.7.4 Optimal Routing Assignment#
The routing problem is formulated as a constrained assignment:
subject to:
For small and (typical in practice: sub-queries, sources), this is solvable by greedy assignment or linear relaxation within microsecond latency budgets.
7.7.5 Multi-Source Redundancy#
For high-stakes sub-queries (where retrieval failure is costly), the system may route to multiple sources and reconcile results:
7.7.6 Pseudo-Algorithm: Schema-Aware Routing#
ALGORITHM 7.7: RouteSubQueries(G_sq, R_S, B_cost)
──────────────────────────────────────────────────
Input:
G_sq — sub-query DAG from decomposition
R_S — source registry
B_cost — total cost budget
Output:
routing — map from sub-query → list of (source, priority, deadline)
1. routing ← {}
2. cost_remaining ← B_cost
3. FOR EACH sq IN TopologicalSort(G_sq) DO
4. candidates ← []
5. FOR EACH (s, σ, α, λ, c, φ) IN R_S DO
6. IF SchemaMatch(σ, sq) > 0 THEN
7. score ← ComputeRoutingScore(sq, s, σ, α, λ, c, φ)
8. candidates ← candidates ∪ {(s, score, c)}
9. END IF
10. END FOR
11.
12. // Sort by score descending
13. candidates ← SortDescending(candidates, by=score)
14.
15. // Select top-k based on redundancy factor
16. k ← RedundancyFactor(sq)
17. selected ← []
18. FOR i = 1 TO min(k, |candidates|) DO
19. IF cost_remaining ≥ candidates[i].cost THEN
20. selected ← selected ∪ {candidates[i]}
21. cost_remaining ← cost_remaining - candidates[i].cost
22. END IF
23. END FOR
24.
25. IF |selected| = 0 THEN
26. // No affordable source; flag for degraded execution
27. selected ← [{source=FALLBACK, priority=LOW, deadline=sq.deadline}]
28. Log("WARNING: Sub-query routed to fallback", sq)
29. END IF
30.
31. routing[sq] ← selected
32. END FOR
33. RETURN routing7.8 Multi-Modal Query Understanding: Interpreting Mixed Text, Image, Code, and Data Table Inputs#
7.8.1 Multi-Modal Query Representation#
Modern agentic systems receive inputs spanning multiple modalities. A multi-modal query is represented as:
where .
7.8.2 Modality-Specific Encoding#
Each modality requires a specialized encoder to produce a unified semantic representation:
- Text: Transformer-based language model encoder.
- Image: Vision transformer (ViT) or CLIP visual encoder.
- Code: Code-specialized encoder (e.g., CodeBERT, StarCoder embeddings) with AST-aware tokenization.
- Table: Structural encoder preserving row-column relationships; schema-aware embedding.
7.8.3 Cross-Modal Fusion#
The multi-modal query embedding is computed via cross-modal attention:
where text serves as the query modality and non-text inputs serve as context modalities. This preserves the primacy of the textual query intent while grounding it in visual, structural, or code evidence.
7.8.4 Modality-Specific Intent Extraction#
Different modalities contribute different types of information to intent resolution:
| Modality | Contribution to Intent |
|---|---|
| Text | Explicit intent statement, constraints, questions |
| Image | Visual evidence (error screenshots, architecture diagrams, UI states) |
| Code | Implementation context, error locations, API signatures |
| Table | Data context, value ranges, anomaly patterns |
The system must extract modality-specific signals and reconcile them:
Conflict resolution: When modalities suggest conflicting intents (e.g., text says "this works fine" but the screenshot shows an error), the system flags the conflict and prioritizes the higher-evidence-weight modality or requests clarification.
7.8.5 Pseudo-Algorithm: Multi-Modal Query Understanding#
ALGORITHM 7.8: MultiModalQueryUnderstand(q_mm, H, M, S)
───────────────────────────────────────────────────────
Input:
q_mm — multi-modal query: list of (modality, type, content)
H — conversational history
M — memory state
S — system state
Output:
q_unified — unified query representation
I_mm — multi-modal intent set
1. embeddings ← {}
2. modality_intents ← {}
3. FOR EACH (m, type, content) IN q_mm DO
4. // Modality-specific encoding
5. v_i ← Encode(content, encoder=SelectEncoder(type))
6. embeddings[type] ← embeddings[type] ∪ {v_i}
7.
8. // Modality-specific intent extraction
9. SWITCH type:
10. CASE "text":
11. intents_i ← TextIntentClassify(content, H)
12. CASE "image":
13. description ← ImageCaption(content)
14. entities_visual ← VisualEntityExtract(content)
15. intents_i ← InferIntentFromVisual(description, entities_visual)
16. CASE "code":
17. ast ← ParseAST(content)
18. errors ← DetectCodeIssues(content, ast)
19. intents_i ← InferIntentFromCode(content, ast, errors)
20. CASE "table":
21. schema ← InferTableSchema(content)
22. anomalies ← DetectAnomalies(content, schema)
23. intents_i ← InferIntentFromTable(schema, anomalies)
24. modality_intents[type] ← intents_i
25. END FOR
26. // Cross-modal fusion
27. IF "text" IN embeddings THEN
28. v_fused ← CrossAttention(embeddings["text"], AllOtherEmbeddings(embeddings))
29. ELSE
30. v_fused ← MeanPool(AllEmbeddings(embeddings))
31. END IF
32. // Reconcile intents across modalities
33. I_mm ← ReconcileIntents(modality_intents)
34. conflicts ← DetectConflicts(modality_intents)
35. IF |conflicts| > 0 THEN
36. I_mm ← AnnotateConflicts(I_mm, conflicts)
37. END IF
38. q_unified ← (v_fused, I_mm, embeddings, modality_intents)
39. RETURN q_unified, I_mm7.9 Clarification Detection and Active Query Refinement Protocols#
7.9.1 When to Ask, Not Answer#
A system that always attempts to answer is not robust—it is reckless. Clarification detection determines when the system's confidence in its interpretation is insufficient to justify execution, and triggers a structured interaction to resolve the ambiguity.
7.9.2 Clarification Triggers#
We define a clarification trigger function based on multiple signals:
Each trigger corresponds to a specific pipeline failure mode:
| Trigger | Pipeline Stage | Failure Mode |
|---|---|---|
| High intent entropy | §7.2 | Cannot determine what the user wants |
| Contradicted presupposition | §7.3 | Query premises are false |
| High semantic ambiguity | §7.4 | Query maps to too many interpretations |
| Unresolved reference | §7.5.3 | Cannot determine referent in multi-turn |
| Too many intents | §7.2 | Query is too complex to serve atomically |
| Low coverage | §7.6 | Decomposition lost information |
| No viable route | §7.7 | No source can serve a sub-query |
7.9.3 Clarification Quality: Minimal, Discriminative, Actionable#
A good clarification request satisfies three properties:
- Minimal: Asks the fewest questions necessary to resolve the ambiguity.
- Discriminative: Each possible answer leads to a distinct execution path.
- Actionable: The expected answers map directly to pipeline parameters.
Formal minimality constraint:
This is the minimum number of clarifying questions such that the residual intent entropy drops below the resolution threshold.
7.9.4 Clarification Generation Strategy#
ALGORITHM 7.9: GenerateClarification(q, I_set, F_p, G_sq, triggers)
───────────────────────────────────────────────────────────────────
Input:
q — processed query
I_set — classified intents
F_p — pragmatic frame
G_sq — sub-query DAG
triggers — set of active clarification triggers
Output:
clarification — structured clarification request, or NULL if none needed
1. IF |triggers| = 0 THEN RETURN NULL
2. questions ← []
3. // Prioritize triggers by impact
4. sorted_triggers ← SortByImpact(triggers)
5. FOR EACH trigger IN sorted_triggers DO
6. SWITCH trigger.type:
7. CASE "high_intent_entropy":
8. // Generate discriminative question
9. top_intents ← TopK(I_set, k=3, by=confidence)
10. q_clar ← FormatDisambiguation(top_intents)
11. // e.g., "Are you asking about X, Y, or Z?"
12. questions ← questions ∪ {q_clar}
13.
14. CASE "contradicted_presupposition":
15. (p, status, evidence) ← trigger.details
16. q_clar ← FormatPresuppositionCorrection(p, evidence)
17. // e.g., "The migration actually succeeded. Did you mean..."
18. questions ← questions ∪ {q_clar}
19.
20. CASE "unresolved_reference":
21. (ref, candidates) ← trigger.details
22. q_clar ← FormatReferenceDisambiguation(ref, candidates)
23. questions ← questions ∪ {q_clar}
24.
25. CASE "no_viable_route":
26. sq ← trigger.sub_query
27. q_clar ← FormatScopeNarrowing(sq)
28. // e.g., "I don't have access to X. Can you provide..."
29. questions ← questions ∪ {q_clar}
30.
31. // Enforce minimality: stop if remaining entropy is low
32. estimated_residual ← EstimateResidualEntropy(I_set, questions)
33. IF estimated_residual < θ_resolved THEN BREAK
34. END FOR
35. clarification ← {
36. questions: questions,
37. context: SummarizeUnderstanding(q, I_set, F_p),
38. options: GenerateOptions(questions), // structured choices where possible
39. fallback_action: DescribeDefaultBehavior(I_set, G_sq)
40. }
41. RETURN clarification7.9.5 Graceful Degradation When Clarification is Not Possible#
In asynchronous or batch contexts where interactive clarification is unavailable, the system must degrade gracefully:
- Conservative interpretation: Select the highest-confidence intent and explicitly state the assumption.
- Multi-interpretation hedging: Execute the top- interpretations and present results organized by interpretation.
- Partial execution: Execute only the sub-queries with confident routing, and annotate gaps.
7.10 Cognitive Reasoning Integration: Deductive, Inductive, Abductive, and Analogical Inference Modes#
7.10.1 Reasoning as a First-Class Pipeline Component#
Query understanding is not purely a classification and retrieval problem. Many queries require reasoning over the query itself before retrieval can begin. We formalize four reasoning modes that the pipeline may invoke during query understanding.
7.10.2 Deductive Reasoning#
Definition: From general premises and the specific query, derive necessary conclusions.
Application in query understanding: If organizational policy states "All production deployments require security review" and the user asks "Deploy service X to production," the system deductively infers that a security review check is a prerequisite, even though the user did not mention it.
Formal representation:
7.10.3 Inductive Reasoning#
Definition: From observed patterns, infer general principles that inform query interpretation.
Application: If the user has asked about service latency five times in the past week, each time followed by a deployment change, the system inductively infers that the current latency question is likely a pre-deployment check.
7.10.4 Abductive Reasoning#
Definition: From observed effects, infer the most likely explanatory cause.
Application: The user says "The dashboard is showing weird numbers." Abductive reasoning generates hypotheses about what might be wrong (data pipeline failure, metric misconfiguration, timezone mismatch) and structures retrieval to investigate each hypothesis.
This is structurally equivalent to maximum a posteriori (MAP) inference over a hypothesis space.
7.10.5 Analogical Reasoning#
Definition: Map the current query to a structurally similar past query and transfer the solution strategy.
Application: If the system previously solved "Why is service A slow?" by checking database connection pools, and now receives "Why is service B slow?", analogical reasoning suggests starting with the same diagnostic pathway.
Structural mapping formalism:
where measures structural (relational) similarity and measures similarity of expected outcomes.
7.10.6 Reasoning Mode Selection#
The cognitive load estimator (§7.4) and the intent classifier (§7.2) jointly determine which reasoning modes to activate:
where .
| Query Pattern | Primary Mode | Secondary Mode |
|---|---|---|
| Rule-governed task | Deductive | — |
| Recurring diagnostic | Inductive | Analogical |
| Unexplained phenomenon | Abductive | Inductive |
| Novel problem resembling past case | Analogical | Abductive |
| Complex multi-constraint | Deductive | Abductive |
7.10.7 Pseudo-Algorithm: Reasoning Integration#
ALGORITHM 7.10: IntegrateReasoningModes(q, I_set, F_p, κ, M)
─────────────────────────────────────────────────────────────
Input:
q — resolved query
I_set — intent set
F_p — pragmatic frame
κ — cognitive load
M — full memory state (episodic, semantic, procedural)
Output:
q_augmented — query augmented with reasoning-derived context
hypotheses — ranked list of reasoning-generated hypotheses
1. active_modes ← SelectReasoningModes(q, κ, I_set)
2. reasoning_outputs ← []
3. IF "deductive" IN active_modes THEN
4. rules ← RetrieveApplicableRules(q, M.procedural)
5. deductions ← ApplyRules(q, rules, M.semantic)
6. reasoning_outputs ← reasoning_outputs ∪ {("deductive", deductions)}
7. END IF
8. IF "inductive" IN active_modes THEN
9. patterns ← FindPatterns(q, M.episodic, window=30_days)
10. generalizations ← Generalize(patterns)
11. reasoning_outputs ← reasoning_outputs ∪ {("inductive", generalizations)}
12. END IF
13. IF "abductive" IN active_modes THEN
14. observations ← ExtractObservations(q, F_p)
15. hypotheses_abd ← GenerateHypotheses(observations, M.semantic, k=5)
16. ranked ← RankByPosterior(hypotheses_abd, observations)
17. reasoning_outputs ← reasoning_outputs ∪ {("abductive", ranked)}
18. END IF
19. IF "analogical" IN active_modes THEN
20. similar_cases ← FindAnalogousCases(q, M.episodic, threshold=θ_analogy)
21. strategies ← TransferStrategies(similar_cases, q)
22. reasoning_outputs ← reasoning_outputs ∪ {("analogical", strategies)}
23. END IF
24. // Merge reasoning outputs into augmented query
25. q_augmented ← AugmentQuery(q, reasoning_outputs)
26. hypotheses ← ConsolidateHypotheses(reasoning_outputs)
27. RETURN q_augmented, hypotheses7.11 Theory of Mind Modeling: Inferring User Knowledge State, Expertise Level, and Unstated Goals#
7.11.1 The Necessity of User Modeling#
Two users asking the identical query may need fundamentally different responses. A junior developer asking "What is a connection pool?" needs an explanation. A senior engineer asking the same question in the context of a production incident needs the connection pool configuration for the specific service that is failing. Theory of Mind (ToM) modeling enables the system to infer what the user knows, what they need, and what they have left unstated.
7.11.2 User State Model#
We define the user state as a structured representation:
| Component | Definition | Source |
|---|---|---|
| Expertise vector across domains | Profile, interaction history | |
| Estimated knowledge set (concepts the user knows) | Prior interactions, role | |
| Explicitly stated goals | Current query | |
| Inferred unstated goals | Pragmatic analysis, patterns | |
| Response preferences (format, depth, verbosity) | History, explicit settings | |
| Current operational context (task, deadline, stress) | Session signals |
7.11.3 Expertise Estimation#
We estimate expertise per domain via a Bayesian update:
The likelihood is modeled by observing:
- Vocabulary sophistication: Use of domain-specific terminology increases .
- Query specificity: Specific, well-scoped queries indicate higher expertise.
- Follow-up patterns: Users who ask progressive, building questions are more expert than those who repeat basics.
- Tool usage: Expert users invoke advanced tools; novices use basic ones.
Bayesian update rule (simplified):
where is a learning rate and extracts a per-domain expertise indicator from the current query.
7.11.4 Unstated Goal Inference#
Unstated goals are inferred from the gap between what the user asked and what a user with their expertise and context would likely need:
Example: A user with high DevOps expertise asking about deployment configuration in a production context unstately needs rollback procedures, even if they did not ask for them.
7.11.5 Response Calibration#
The user state model calibrates the response along multiple axes:
7.11.6 Knowledge Gap Detection#
The system identifies concepts the query depends on that the user may not know:
If , the system may:
- For novice users: Proactively explain prerequisite concepts.
- For expert users: Skip prerequisites and provide direct answers.
- For uncertain expertise: Provide layered responses (summary + details).
7.11.7 Pseudo-Algorithm: Theory of Mind Construction#
ALGORITHM 7.11: BuildUserModel(q, H, M, UserProfile)
────────────────────────────────────────────────────
Input:
q — current query
H — conversation history
M — memory (session, episodic)
UserProfile — stored user profile (if available)
Output:
U — user state model
1. // Initialize from profile or defaults
2. IF UserProfile EXISTS THEN
3. ξ ← UserProfile.expertise_vector
4. P_pref ← UserProfile.preferences
5. ELSE
6. ξ ← UniformPrior(|D|, value=0.5)
7. P_pref ← DefaultPreferences()
8. END IF
9. // Update expertise from current session
10. FOR EACH (q_i, r_i) IN H DO
11. FOR EACH domain d IN IdentifyDomains(q_i) DO
12. signal ← ComputeExpertiseSignal(q_i, d)
13. ξ[d] ← ξ[d] + η · (signal - ξ[d])
14. END FOR
15. END FOR
16. // Current query expertise signal
17. FOR EACH domain d IN IdentifyDomains(q) DO
18. signal ← ComputeExpertiseSignal(q, d)
19. ξ[d] ← ξ[d] + η · (signal - ξ[d])
20. END FOR
21. // Estimate knowledge set
22. K_u ← EstimateKnowledgeSet(ξ, H, M.episodic)
23. // Extract stated goals
24. G_stated ← ExtractExplicitGoals(q, F_p)
25. // Infer unstated goals
26. G_typical ← TypicalGoalSet(ξ, C_ctx, I_set)
27. G_unstated ← G_typical \ G_stated
28. // Detect operational context
29. C_ctx ← InferContext(q, H, M.session)
30. // Context includes: task_type, deadline_pressure, incident_mode
31. // Detect knowledge gaps
32. prerequisites ← ComputePrerequisites(q, I_set)
33. gaps ← prerequisites \ K_u
34. U ← (ξ, K_u, G_stated, G_unstated, P_pref, C_ctx, gaps)
35. RETURN U7.12 Query Understanding Quality Metrics: Precision of Decomposition, Routing Accuracy, Enrichment Lift#
7.12.1 Measurement Imperative#
A query understanding pipeline that cannot be measured cannot be improved. We define a comprehensive metrics framework covering every pipeline stage, enabling CI/CD-integrated evaluation, regression detection, and continuous optimization.
7.12.2 Intent Classification Metrics#
Precision, Recall, F1 per intent class:
Multi-intent accuracy (subset accuracy):
where and are the predicted and ground-truth intent sets for query . This is a strict metric: partial matches score zero.
Hamming loss (relaxed multi-label metric):
7.12.3 Decomposition Quality Metrics#
Semantic Preservation Score (SPS):
A perfect decomposition achieves . Any value below 1.0 indicates information loss.
Decomposition Granularity Index (DGI):
indicates under-decomposition (sub-queries too coarse). indicates over-decomposition (unnecessary fragmentation). Optimal: .
Dependency Correctness:
measuring whether the identified dependencies between sub-queries match ground-truth dependency structures.
7.12.4 Routing Accuracy Metrics#
Source Match Rate (SMR):
Latency Compliance Rate (LCR):
Cost Efficiency:
measuring quality-per-unit-cost of the routing decisions.
7.12.5 Enrichment Lift Metrics#
Retrieval Recall Lift from Enrichment:
Positive values indicate that enrichment improved retrieval coverage.
Precision Preservation:
Enrichment must improve recall without substantially degrading precision. Target: .
HyDE Effectiveness:
measuring the normalized discounted cumulative gain improvement from HyDE-based retrieval versus raw query retrieval.
7.12.6 End-to-End Query Understanding Quality#
Query Understanding Score (QUS):
We define an aggregate metric that captures the end-to-end quality of the pipeline:
where:
- is the fraction of clarifications that were not actually needed (false positive clarifications).
7.12.7 Operational Metrics#
Beyond quality, production systems must track operational health:
| Metric | Definition | Target |
|---|---|---|
| Pipeline Latency P50/P95/P99 | End-to-end query understanding time | P95 < 200ms |
| Token Consumption | Tokens used by rewriting + HyDE + reasoning | < 15% of total budget |
| Clarification Rate | Fraction of queries requiring clarification | < 10% |
| Fallback Rate | Fraction of queries degrading to raw retrieval | < 5% |
| Decomposition Rate | Fraction of queries decomposed into sub-query | Monitored (not targeted) |
7.12.8 Continuous Evaluation Pipeline#
ALGORITHM 7.12: QueryUnderstandingEvalPipeline(eval_set, pipeline)
─────────────────────────────────────────────────────────────────
Input:
eval_set — list of (query, ground_truth) pairs
ground_truth = {intents, decomposition, routing, enrichment_targets}
pipeline — the query understanding pipeline under evaluation
Output:
report — comprehensive quality report with per-stage metrics
1. metrics ← InitializeMetricsAccumulator()
2. FOR EACH (q, gt) IN eval_set DO
3. // Run pipeline
4. start ← Now()
5. result ← pipeline.Process(q)
6. latency ← Now() - start
7.
8. // Intent metrics
9. metrics.intent_predictions ← Append(result.I_set)
10. metrics.intent_ground_truth ← Append(gt.intents)
11.
12. // Decomposition metrics
13. sps ← ComputeSPS(q, result.G_sq, gt.decomposition)
14. dgi ← ComputeDGI(result.G_sq, gt.decomposition)
15. dep_corr ← ComputeDepCorrectness(result.G_sq, gt.decomposition)
16. metrics.decomposition ← Append(sps, dgi, dep_corr)
17.
18. // Routing metrics
19. smr ← ComputeSMR(result.routing, gt.routing)
20. metrics.routing ← Append(smr)
21.
22. // Enrichment lift (requires retrieval execution)
23. recall_raw ← MeasureRecall(q, gt.enrichment_targets)
24. recall_enriched ← MeasureRecall(result.q_enriched, gt.enrichment_targets)
25. lift ← (recall_enriched / recall_raw) - 1
26. metrics.enrichment_lift ← Append(lift)
27.
28. // Operational
29. metrics.latencies ← Append(latency)
30. metrics.token_usage ← Append(result.token_count)
31. metrics.clarification_triggered ← Append(result.clarification ≠ NULL)
32. END FOR
33. // Compute aggregate metrics
34. report ← {
35. intent_f1: ComputeF1(metrics.intent_predictions, metrics.intent_ground_truth),
36. intent_subset_acc: ComputeSubsetAccuracy(...),
37. avg_sps: Mean(metrics.decomposition.sps),
38. avg_dgi: Mean(metrics.decomposition.dgi),
39. avg_smr: Mean(metrics.routing),
40. avg_enrichment_lift: Mean(metrics.enrichment_lift),
41. latency_p50: Percentile(metrics.latencies, 50),
42. latency_p95: Percentile(metrics.latencies, 95),
43. latency_p99: Percentile(metrics.latencies, 99),
44. clarification_rate: Mean(metrics.clarification_triggered),
45. qus: ComputeQUS(report)
46. }
47. // Regression detection
48. previous ← LoadPreviousReport()
49. IF previous ≠ NULL THEN
50. regressions ← DetectRegressions(report, previous, thresholds)
51. IF |regressions| > 0 THEN
52. report.regressions ← regressions
53. report.status ← "REGRESSION_DETECTED"
54. END IF
55. END IF
56. PersistReport(report)
57. RETURN report7.12.9 Evaluation-Driven Improvement Loop#
The metrics framework is not passive. It drives an improvement loop:
- Failed traces (low SPS, incorrect routing, unnecessary clarifications) are captured and normalized into regression test cases.
- Pattern analysis identifies systematic failure modes (e.g., "sequential decomposition consistently misses the third dependency").
- Policy updates are derived from failure patterns and integrated into the pipeline's procedural memory.
- A/B testing of pipeline variants (e.g., new HyDE prompts, updated ontologies) is evaluated against the benchmark set.
- Quality gates in CI/CD enforce that no pipeline change may degrade QUS below the established baseline.
The closed loop from evaluation to improvement is what transforms a query understanding pipeline from a static component into an evolving, self-improving system.
Chapter Summary#
This chapter formalized query understanding as a typed cognitive pipeline that transforms raw, ambiguous, context-dependent user inputs into structured, provenance-tagged, schema-routed execution plans. The key architectural contributions are:
| Contribution | Section | Core Idea |
|---|---|---|
| Pipeline formalization | §7.1 | Query understanding as |
| Hierarchical multi-intent detection | §7.2 | Open-domain, calibrated, multi-label intent classification |
| Pragmatic frame construction | §7.3 | Gricean analysis, presupposition validation, speech act classification |
| Cognitive load estimation | §7.4 | Multi-dimensional complexity scoring for adaptive pipeline configuration |
| HyDE and enrichment | §7.5 | Hypothesis-driven retrieval, ontological expansion, multi-turn resolution |
| Decomposition DAGs | §7.6 | Parallel, sequential, and conditional sub-query graphs |
| Schema-aware routing | §7.7 | Multi-objective source assignment under latency and cost constraints |
| Multi-modal understanding | §7.8 | Cross-modal fusion and modality-specific intent extraction |
| Clarification protocols | §7.9 | Minimal, discriminative, actionable clarification with graceful degradation |
| Cognitive reasoning | §7.10 | Deductive, inductive, abductive, and analogical inference integration |
| Theory of mind | §7.11 | User expertise estimation, unstated goal inference, knowledge gap detection |
| Quality metrics | §7.12 | SPS, DGI, SMR, enrichment lift, QUS, and CI-integrated evaluation |
The unifying principle is that query understanding is the highest-leverage intervention point in any agentic system. Every downstream operation—retrieval, tool use, reasoning, synthesis, verification—is bounded by the quality of query understanding. A system that invests in this stage proportionally outperforms one that invests the same compute budget elsewhere in the pipeline. The formal metrics framework (§7.12) ensures this investment is measurable, reproducible, and continuously improving.