Preamble#
Agent architectures are not implementation accidents; they are structural commitments that determine what an agent can represent, how it reasons, when it acts, and how it fails. Every production agentic system—whether built on large language models (LLMs), classical planners, or hybrid stacks—implicitly or explicitly instantiates one or more of the formal architectures catalogued in this chapter. The difference between a brittle demonstration and a reliable, scalable agentic application lies precisely in whether the engineering team has made these architectural commitments explicit, typed, bounded, and verifiable.
This chapter provides an exhaustive formal treatment of twelve foundational agent architectures, each analyzed along five axes:
- Formal definition — mathematical objects, state spaces, transition semantics.
- Structural decomposition — components, interfaces, data flow.
- Mapping to LLM-based agentic systems — how each architecture instantiates concretely in modern context-engineered, tool-augmented, memory-layered agent stacks.
- Production implications — reliability, observability, fault tolerance, token efficiency, latency.
- Composition properties — how the architecture interacts with other architectures in layered or multi-agent configurations.
The reader should emerge with the capacity to select, combine, and formally justify architectural decisions for any agentic system at enterprise scale.
3.1 BDI (Belief–Desire–Intention) Architecture Applied to LLM Agents#
3.1.1 Philosophical and Formal Foundations#
The Belief–Desire–Intention architecture originates in Michael Bratman's theory of practical reasoning (1987) and was formalized computationally by Rao and Georgeff (1991, 1995). Its central insight is that rational agency requires not only beliefs about the world and desires regarding future states, but a distinct commitment mechanism—intentions—that constrains future deliberation, filters out irrelevant options, and stabilizes behavior over time.
Formal BDI Tuple#
A BDI agent is defined as a tuple:
where:
| Symbol | Definition |
|---|---|
| Belief base: a set of propositions representing the agent's current model of the world, including uncertainty. | |
| Desire set: a set of goal states the agent finds desirable, not necessarily mutually consistent. | |
| Intention stack: an ordered, committed subset of plans currently being pursued, satisfying . | |
| Plan library: a mapping from triggering events and context conditions to partially ordered sequences of actions and subgoals. | |
| Belief revision function: . | |
| Desire deliberation function: , generating or pruning desires given current beliefs. | |
| Intention selection / filtering function: , committing to or dropping intentions. | |
| Plan selection function: , selecting the next executable action. |
The BDI Reasoning Cycle#
The canonical BDI reasoning cycle proceeds as follows at each time step :
ALGORITHM 3.1: BDI-Reasoning-Cycle
────────────────────────────────────
Input: Agent state (B, D, I), percept stream P_stream, plan library PL
Output: Action a_t, updated agent state (B', D', I')
1. percept ← OBSERVE(P_stream)
2. B' ← BELIEF_REVISE(B, percept)
3. D' ← DELIBERATE(B', D) // generate/filter desires
4. I_candidates ← FILTER(B', D', I) // compatible intentions
5. IF RECONSIDER?(B', I) THEN
6. I' ← SELECT_INTENTIONS(B', D', I_candidates)
7. ELSE
8. I' ← I // commitment persists
9. plan ← SELECT_PLAN(B', I', PL)
10. a_t ← HEAD(plan) // next primitive action
11. EXECUTE(a_t)
12. RETURN (B', D', I'), a_tThe critical design decision is the reconsideration policy at step 5. An agent that reconsiders too frequently is overcautious (high deliberation cost, low throughput). An agent that reconsiders too rarely is overcommitted (brittle under environmental change). This trade-off is formally captured as:
where is a tunable belief-change threshold and detects plan infeasibility.
3.1.2 Mapping BDI to LLM-Based Agentic Systems#
The BDI framework maps with high fidelity onto modern LLM agent stacks when each BDI component is given a concrete, typed implementation:
| BDI Component | LLM Agent Realization |
|---|---|
| Beliefs | Context window contents: system prompt policy, retrieved evidence (provenance-tagged), tool output history, environment observations, validated memory summaries. The belief base is the compiled prefill at inference time. |
| Desires | User-specified goals, system-level objectives (safety constraints, quality targets), and derived subgoals from task decomposition. Represented as structured goal objects with priority, deadline, and success criteria. |
| Intentions | The currently committed plan trace: a sequence of planned tool calls, retrieval operations, and generation steps with explicit ordering and dependency edges. Stored in working memory with rollback metadata. |
| Plan Library | A registry of known workflows, tool-chain templates, and decomposition strategies, exposed as typed MCP tool/resource surfaces or as few-shot plan exemplars in the prompt. |
| Belief Revision | Context engineering pipeline: retrieval re-ranking, memory refresh, observation integration, contradiction detection, and stale-context pruning—executed before each generation step. |
| Desire Deliberation | LLM-driven goal analysis: the model evaluates which user objectives are achievable given current beliefs, generates subgoals, and flags conflicts. |
| Intention Selection | Plan critique and commitment: the model (or a separate verifier agent) evaluates candidate plans against feasibility, cost, and risk, then commits to an intention stack with bounded depth. |
| Plan Selection | Tool routing and action selection: given the current intention, the model selects the next tool call or generation action, parameterized by typed schemas from the tool registry. |
Belief Revision as Context Engineering#
The most critical BDI function in LLM agents is belief revision. In classical BDI, is a logical update. In LLM agents, it is context construction:
subject to the token budget constraint:
where is the model's context window and is the token budget reserved for generation and reasoning.
Commitment and Reconsideration in LLM Agents#
LLM agents face a specific form of the commitment–reconsideration trade-off: each reconsideration step costs a full inference call (latency + cost), but insufficient reconsideration leads to hallucinated plan continuations when the environment has changed. The production-grade strategy is:
- Observation-triggered reconsideration: reconsider only when a tool call fails, a verification step detects an error, or retrieved evidence contradicts the current plan.
- Bounded intention depth: limit the intention stack to steps before requiring a verification checkpoint. This prevents runaway plan execution.
- Explicit rollback metadata: each intention carries a compensating action or rollback specification, enabling recovery without full replanning.
3.1.3 Extended BDI: Obligations, Norms, and Trust#
Production agents operate under normative constraints (safety policies, authorization boundaries, organizational rules). The extended BDI model adds:
where is a set of obligations (must-do constraints from policy or user approval gates) and is a normative filter that blocks intention adoption if it violates safety, authorization, or ethical constraints. In typed protocol terms, is enforced as a pre-execution validation gate in the agent loop—a mechanically enforced invariant, not a soft prompt instruction.
3.1.4 Production Implications#
| Concern | BDI Mitigation |
|---|---|
| Hallucination control | Beliefs are grounded in provenance-tagged retrieval and verified tool outputs, not in model-generated assertions. The belief base is explicitly constructed, not implicitly assumed. |
| Fault tolerance | Intentions carry rollback metadata; plan failure triggers reconsideration rather than silent continuation. |
| Observability | Each BDI cycle emits a structured trace: beliefs consulted, desires evaluated, intentions adopted/dropped, plan selected, action executed, outcome observed. |
| Token efficiency | The compiled prefill budget enforces that beliefs are compressed and relevant, preventing context window saturation. |
| Idempotency | Intentions are logged with execution state; re-entry after crash replays from the last committed checkpoint, not from the beginning. |
3.2 Cognitive Architectures: SOAR, ACT-R, Global Workspace Theory — Mappings to Agentic AI#
3.2.1 SOAR (State, Operator, And Result)#
Formal Structure#
SOAR, developed by Laird, Newell, and Rosenbloom (1987), models cognition as a search through a problem space. Its core objects are:
| Component | Definition |
|---|---|
| State: the current representation of the problem, including goals, subgoals, and working context. | |
| Operators: transformations applicable to states, selected through preference rules. | |
| Working Memory: a short-term, high-bandwidth store holding the current state, active goals, and operator applications. Analogous to the LLM context window. | |
| Long-Term Memory: divided into procedural (production rules), semantic (facts), and episodic (event sequences) stores. | |
| Chunking mechanism: when an impasse is resolved, the resolution path is compiled into a new production rule and stored in LTM, enabling one-step resolution in the future. | |
| Impasse detection: when the system cannot select an operator, apply an operator, or determine the result, a subgoal is automatically created. |
The SOAR Decision Cycle#
ALGORITHM 3.2: SOAR-Decision-Cycle
───────────────────────────────────
Input: Working Memory WM, Long-Term Memory LTM
Output: Updated WM, executed actions
1. LOOP:
2. proposals ← MATCH_PRODUCTIONS(WM, LTM.procedural)
3. preferences ← EVALUATE_PREFERENCES(proposals)
4. IF UNIQUE_BEST(preferences) THEN
5. op ← SELECT(preferences)
6. WM' ← APPLY(op, WM)
7. IF IS_EXTERNAL(op) THEN EXECUTE_ACTION(op)
8. ELSE
9. impasse_type ← CLASSIFY_IMPASSE(preferences) // tie, conflict, no-change, rejection
10. subgoal ← CREATE_SUBGOAL(impasse_type, WM)
11. PUSH(subgoal, goal_stack)
12. resolution ← SOAR_DECISION_CYCLE(WM ∪ subgoal, LTM) // recursive
13. chunk ← COMPILE_CHUNK(subgoal, resolution)
14. LTM.procedural ← LTM.procedural ∪ {chunk}
15. WM ← WM'Mapping to LLM Agents#
| SOAR Concept | Agentic AI Realization |
|---|---|
| Working Memory | The active context window: compiled prefill + recent tool outputs + current plan state. |
| Long-Term Procedural Memory | Tool registry schemas, workflow templates, and fine-tuned behavioral patterns encoded in model weights or retrieved few-shot exemplars. |
| Long-Term Semantic Memory | Organizational knowledge base accessed via hybrid retrieval (semantic + exact + graph). |
| Long-Term Episodic Memory | Validated episode logs: past task traces with outcomes, stored in episodic memory layer with provenance and expiry. |
| Impasse → Subgoal | When the LLM cannot determine the next action (ambiguity, missing information, tool failure), the agent loop creates a subgoal: ask the user, retrieve more context, invoke a specialist sub-agent. |
| Chunking | Workflow compilation: successful multi-step resolutions are abstracted into reusable plan templates and stored in the plan library. In practice, this maps to automated pipeline construction from successful traces. |
| Operator Proposal | Tool/action candidates proposed by the model given the current context; ranked by a preference function (relevance, cost, latency, safety). |
| Operator Selection | Tool routing: the orchestrator selects the highest-ranked action, with tie-breaking by policy priority. |
Critical Insight: Impasse-Driven Subgoaling as Agent Escalation#
The impasse mechanism is the most valuable SOAR concept for production agents. Rather than allowing the LLM to hallucinate when uncertain, the agent architecture should detect impasses mechanically:
- No applicable tool: the model's proposed action does not match any registered tool schema → trigger retrieval subgoal or human escalation.
- Ambiguous tool selection: multiple tools are equally ranked → trigger a clarification subgoal or invoke a critic agent.
- Verification failure: the output of a tool call does not satisfy the postcondition → trigger a repair subgoal with bounded recursion.
This transforms the failure mode from silent hallucination to explicit escalation, which is the single most important reliability improvement in agentic systems.
3.2.2 ACT-R (Adaptive Control of Thought — Rational)#
Formal Structure#
ACT-R (Anderson, 1993, 2007) models cognition as a modular system with parallel perceptual-motor modules and a serial procedural bottleneck:
Modules include declarative memory, procedural memory, goal, imaginal (working memory), visual, motor, and vocal modules. Each module communicates through a buffer (a single-chunk interface), and the procedural module fires productions that match buffer contents.
Subsymbolic Layer: Activation and Utility#
ACT-R's power lies in its subsymbolic equations. The activation of a declarative memory chunk is:
where:
- is the base-level activation (frequency and recency of access, with decay parameter ).
- is the spreading activation from elements currently in the goal buffer, with associative strength .
- is stochastic noise.
A retrieval request succeeds if (retrieval threshold); the retrieval latency is:
where and are scaling parameters. Higher activation yields faster retrieval.
Production utility determines which production fires when multiple match:
updated by temporal-difference-like reward learning, where is the reward received and is the learning rate.
Mapping to LLM Agents#
| ACT-R Concept | Agentic AI Realization |
|---|---|
| Declarative Memory Activation | Retrieval scoring function: rank retrieved chunks by a composite of base-level activation (access frequency/recency), spreading activation (relevance to current goal), and noise (diversity). This directly informs hybrid retrieval ranking. |
| Base-Level Activation () | Memory item freshness and usage frequency scores in the episodic/semantic memory layer. Items with higher are prioritized in retrieval; items below threshold are candidates for eviction. |
| Spreading Activation | Query-aware re-ranking: the current goal and working context items spread activation to related memory chunks, boosting retrieval of contextually relevant items. This is the theoretical justification for query expansion and contextual re-ranking in RAG pipelines. |
| Retrieval Threshold | Minimum relevance cutoff in the retrieval pipeline: chunks scoring below threshold are excluded from the context window, preventing low-relevance noise. |
| Production Utility Learning | Tool/action selection learning: track the utility of each tool invocation pattern over time; increase utility for patterns that lead to successful task completion, decrease for failures. This maps to bandit-style tool routing optimization. |
| Single-Chunk Buffer Interface | Typed buffer abstraction: each module (retrieval, tool execution, memory, goal management) communicates through a single structured message at a time, enforcing serialization at the orchestration layer and preventing unbounded parallelism. |
Production Implication: ACT-R-Informed Retrieval Scoring#
The ACT-R activation equation provides a principled scoring function for retrieval in agentic systems:
This composite score integrates temporal dynamics, goal-directed attention, and source authority into a single ranking metric that outperforms naive cosine similarity for agentic retrieval.
3.2.3 Global Workspace Theory (GWT)#
Formal Structure#
Global Workspace Theory (Baars, 1988, 2005) models consciousness as a broadcast mechanism: many specialized unconscious processors compete to have their outputs broadcast to the global workspace, making those outputs available to all other processors simultaneously.
| Component | Definition |
|---|---|
| Global Workspace: a shared, capacity-limited broadcast medium. | |
| Specialist processors: domain-specific modules that operate in parallel on their own inputs. | |
| Competition for access: processors form coalitions and compete for workspace access based on relevance, urgency, and activation strength. | |
| Global broadcast: the winning coalition's output is broadcast to all processors, enabling integration. | |
| Attentional selection: a gating mechanism that determines which coalition wins workspace access. |
Mapping to Agentic AI: The Context Window as Global Workspace#
The LLM context window is a direct structural analog of the Global Workspace:
- Capacity-limited: the context window has a fixed token budget. Not all information can be present simultaneously.
- Broadcast medium: whatever is in the context window is available to the model's entire parameter space during inference—functionally, it is broadcast to all "processors" (attention heads, layers).
- Competition for access: retrieval results, memory summaries, tool outputs, and user inputs compete for inclusion in the context window. The context engineering pipeline is the attentional selection mechanism.
- Integration through broadcast: the model synthesizes information from all context sources simultaneously during forward pass, exactly as GWT predicts.
ALGORITHM 3.3: GWT-Inspired-Context-Assembly
─────────────────────────────────────────────
Input: Candidate context items C = {c_1, ..., c_N}, token budget W
Output: Assembled global workspace GW
1. FOR EACH c_i IN C:
2. score_i ← ACTIVATION(c_i) // relevance + urgency + authority + recency
3. coalitions ← FORM_COALITIONS(C, scores) // group related items
4. SORT coalitions BY aggregate_score DESCENDING
5. GW ← ∅; tokens_used ← 0
6. FOR EACH coalition IN coalitions:
7. cost ← TOKEN_COUNT(coalition)
8. IF tokens_used + cost ≤ W THEN
9. GW ← GW ∪ COMPRESS(coalition)
10. tokens_used ← tokens_used + TOKEN_COUNT(COMPRESS(coalition))
11. ELSE
12. BREAK // budget exhausted
13. RETURN GWProduction Implication: Context as Attentional Bottleneck#
GWT explains why context engineering is the most important capability in agentic AI: the context window is the agent's consciousness. If critical information does not make it into the context window, it does not exist for the agent's reasoning. Conversely, irrelevant information in the context window displaces relevant information and degrades reasoning quality (analogous to attentional capture by irrelevant stimuli).
This motivates the engineering principle: the context window must be treated as a scarce, managed resource with explicit admission control, not as an unbounded append-only log.
3.2.4 Comparative Synthesis#
| Property | SOAR | ACT-R | GWT | LLM Agent Implication |
|---|---|---|---|---|
| Memory architecture | WM + 3-type LTM | Modular buffers + declarative/procedural | Global workspace + specialists | Multi-layer memory with typed interfaces |
| Learning mechanism | Chunking (compile experience) | Utility learning + activation tuning | Coalition strengthening | Trace-to-workflow compilation + retrieval scoring adaptation |
| Failure handling | Impasse → subgoal | Retrieval failure → default | Competition failure → no broadcast | Mechanical impasse detection → escalation |
| Attention/Selection | Preference-based | Activation-based | Competition-based | Composite scoring for context admission |
| Bottleneck | Operator selection | Production firing | Workspace capacity | Context window token budget |
3.3 Reactive, Deliberative, and Hybrid Agent Architectures#
3.3.1 Reactive Architectures#
Definition and Principles#
A purely reactive agent maintains no internal world model and no explicit goals. It maps percepts directly to actions through a set of condition–action rules (behaviors):
where is a fixed, stateless mapping function. The canonical reactive architecture is Brooks' Subsumption Architecture (detailed in §3.7), but the principles extend to any system that acts without deliberation.
Properties#
- Latency: minimal; no planning or search overhead.
- Robustness: high for well-defined environments; no world model to become inconsistent.
- Scalability of behavior: poor; complex behavior requires many hand-crafted rules with complex interaction patterns.
- Adaptability: low; cannot reason about novel situations not covered by existing rules.
LLM Agent Analog#
A "zero-shot" LLM call with a system prompt and no retrieval, no memory, no tool use, and no verification loop is the closest LLM analog to a reactive agent. It maps input directly to output through the model's learned (parameter-encoded behaviors). This is the weakest agent architecture and the one most prone to hallucination, because there is no mechanism for grounding, verification, or adaptation.
3.3.2 Deliberative Architectures#
Definition and Principles#
A deliberative agent maintains an explicit symbolic world model and uses search-based planning to select actions:
The agent constructs a model of the world, defines a goal state, and uses a planning algorithm (e.g., STRIPS, GraphPlan, HTN) to find a sequence of actions that transforms the current state into the goal state.
Properties#
- Latency: high; planning is computationally expensive, especially for large state spaces.
- Correctness: high for well-modeled domains; the plan is provably correct with respect to the model.
- Brittleness: high sensitivity to model inaccuracies; if the world model is wrong, the plan is wrong.
- Adaptability: replanning is possible but expensive.
LLM Agent Analog#
An LLM agent that performs explicit multi-step planning before acting—decomposing a user request into a structured task graph, selecting tools for each step, and executing sequentially with verification—is a deliberative agent. The "world model" is the agent's compiled context (beliefs), and the "planning algorithm" is the LLM's chain-of-thought reasoning constrained by the plan library.
3.3.3 Hybrid Architectures#
Definition and Layered Structure#
Hybrid architectures combine reactive and deliberative layers, typically in a layered configuration:
The canonical examples are:
- InteRRaP (Müller, 1996): three layers (behavior, plan, cooperation) with bottom-up activation and top-down commitment.
- TouringMachines (Ferguson, 1992): three layers (reactive, planning, modeling) with a control framework that mediates between layers.
- 3T Architecture (Bonasso et al., 1997): three tiers (skills, sequencing, deliberation) with each tier operating at a different temporal granularity.
The Layered Control Pattern#
ALGORITHM 3.4: Hybrid-Layered-Agent
────────────────────────────────────
Input: Percept p, current agent state S
Output: Action a
1. // Layer 0: Safety / Reactive (highest priority, lowest latency)
2. IF SAFETY_VIOLATION(p, S) THEN
3. RETURN SAFETY_ACTION(p) // immediate, no deliberation
4.
5. // Layer 1: Procedural / Skill Execution
6. IF ACTIVE_PLAN(S) AND NEXT_STEP_APPLICABLE(S, p) THEN
7. RETURN EXECUTE_NEXT_STEP(S) // continue committed plan
8.
9. // Layer 2: Deliberative / Planning
10. IF GOAL_UNACHIEVED(S) THEN
11. plan ← REPLAN(S.model, S.goal, p)
12. S.plan ← plan
13. RETURN EXECUTE_NEXT_STEP(S)
14.
15. // Layer 3: Cooperative / Meta-reasoning
16. IF COORDINATION_NEEDED(S) THEN
17. RETURN COORDINATE(S, peers)
18.
19. RETURN IDLE_ACTION()Mapping to Production LLM Agent Stacks#
The hybrid layered architecture is the dominant pattern in production agentic systems:
| Layer | LLM Agent Realization | Temporal Granularity |
|---|---|---|
| Reactive / Safety | Input/output guardrails, content filters, authorization checks, rate limiters. No LLM inference required; executed as typed pre/post-conditions. | Microseconds to milliseconds |
| Procedural / Skill | Tool execution, API calls, file operations—following a committed plan step. Tool routing based on typed schemas. | Milliseconds to seconds |
| Deliberative / Planning | LLM-driven task decomposition, plan generation, retrieval-augmented reasoning. Full inference cycle. | Seconds to minutes |
| Cooperative / Meta | Multi-agent coordination, human escalation, cross-agent memory sharing, approval gates. | Minutes to hours |
The control framework mediates between layers through priority-based arbitration: safety always preempts planning; planning preempts idle exploration; ongoing skill execution preempts replanning unless a verification failure or impasse is detected.
3.3.4 Trade-Off Analysis#
| Architecture | Reactivity | Optimality | Adaptability | Implementation Complexity |
|---|---|---|---|---|
| Reactive | ★★★★★ | ★☆☆☆☆ | ★☆☆☆☆ | ★★☆☆☆ |
| Deliberative | ★☆☆☆☆ | ★★★★★ | ★★★☆☆ | ★★★★☆ |
| Hybrid | ★★★★☆ | ★★★★☆ | ★★★★☆ | ★★★★★ |
The hybrid architecture is strictly preferred for production agentic systems because it provides safety guarantees (reactive layer), plan quality (deliberative layer), and coordination capacity (cooperative layer) simultaneously. The cost is implementation complexity, which is managed through typed layer interfaces and explicit control policies.
3.4 Hierarchical Task Networks (HTN) for Complex Plan Decomposition#
3.4.1 Formal Definition#
A Hierarchical Task Network planning domain is defined as:
| Symbol | Definition |
|---|---|
| Primitive tasks: directly executable actions (operators). | |
| Compound (abstract) tasks: tasks that must be decomposed further. | |
| Methods: decomposition rules , where is the task being decomposed, are preconditions, and is an ordered or partially ordered set of subtasks. | |
| Operators: state-transition specifications for primitive tasks, . | |
| Initial state: the starting world state. |
A task network is a tuple where is a set of task nodes and is a set of constraints (ordering, variable binding, state conditions).
HTN Planning as Recursive Decomposition#
HTN planning proceeds by recursively replacing compound tasks with their method decompositions until only primitive tasks remain:
where contains the initial abstract task and contains only primitive tasks whose execution sequence achieves the goal. The decomposition is valid if:
- Every compound task is replaced by a method whose preconditions are satisfied.
- The resulting primitive task sequence is executable in the initial state.
- All ordering and binding constraints are satisfied.
3.4.2 HTN Decomposition Algorithm#
ALGORITHM 3.5: HTN-Decompose
─────────────────────────────
Input: Task network w = (U, C), methods M, operators O, state s
Output: Executable plan π or FAILURE
1. IF U contains only primitive tasks THEN
2. π ← LINEARIZE(U, C) // respect ordering constraints
3. IF EXECUTABLE(π, s, O) THEN RETURN π
4. ELSE RETURN FAILURE
5. SELECT t ∈ U such that t ∈ T_c // choose a compound task
6. applicable ← {m ∈ M : m.task = t AND SATISFIES(m.pre, s)}
7. IF applicable = ∅ THEN RETURN FAILURE
8. FOR EACH m ∈ applicable (ordered by heuristic priority):
9. w' ← REPLACE(w, t, m.subtasks) // substitute decomposition
10. C' ← C ∪ m.constraints
11. result ← HTN_DECOMPOSE(w' = (U', C'), M, O, s)
12. IF result ≠ FAILURE THEN RETURN result
13. RETURN FAILURE // all methods failed → backtrack3.4.3 Mapping HTN to LLM Agent Task Decomposition#
HTN is the most natural planning formalism for LLM-based agents because LLMs already perform task decomposition as a reasoning pattern (chain-of-thought, tree-of-thought). The mapping is:
| HTN Concept | LLM Agent Realization |
|---|---|
| Compound task | High-level user request or agent-generated subgoal (e.g., "Refactor the authentication module"). |
| Method | A decomposition template: a known workflow for breaking a compound task into subtasks. Stored in the plan library as typed method schemas with preconditions. |
| Primitive task | A single tool call, API invocation, or atomic generation step with a typed schema (MCP tool surface). |
| Preconditions | Belief checks: "Is the repository cloned? Is the file accessible? Does the user have authorization?" Evaluated against the agent's current belief base. |
| Ordering constraints | Dependency edges in the task graph: "Subtask B requires the output of Subtask A." Enforced by the orchestrator. |
| Decomposition | LLM-driven planning: given a compound task and the current context, the model generates a decomposition. The plan library provides method templates; the LLM selects and adapts. |
| Backtracking | If a decomposition path fails (tool error, verification failure), the agent backtracks to try an alternative method, bounded by a retry budget. |
Typed Method Schema (Pseudo-Schema)#
METHOD DecomposeCodeRefactor:
task: "refactor_module"
preconditions:
- belief("repository_cloned") = true
- belief("module_exists", module_name) = true
subtasks:
1. analyze_module(module_name) → analysis [primitive, tool: code_analysis]
2. generate_plan(analysis) → refactor_plan [primitive, tool: llm_generate]
3. apply_changes(refactor_plan) → diff [primitive, tool: code_edit]
4. run_tests(module_name) → test_result [primitive, tool: test_runner]
5. verify_result(test_result, refactor_plan) [primitive, tool: llm_verify]
ordering: 1 < 2 < 3 < 4 < 5
rollback: revert_changes(diff)
max_retries: 23.4.4 Ordered vs. Partially Ordered HTN#
- Totally Ordered STN (Simple Task Networks): subtasks have a linear order. Simpler to execute but less parallelizable. Used when task dependencies are strictly sequential.
- Partially Ordered STN: subtasks have partial ordering constraints. Enables parallel execution of independent subtasks. Requires a concurrent orchestrator with merge semantics.
For production multi-agent systems, partially ordered HTN is preferred because it enables:
where the critical path is the longest dependency chain. Independent subtasks can be distributed to parallel agents with isolated workspaces.
3.4.5 HTN and Verification#
HTN naturally integrates verification as a decomposition primitive. For every method that produces an artifact, a verification subtask is added:
If fails, the orchestrator triggers a repair method or backtracks. This is the formal basis for the plan → act → verify → repair loop in agentic systems.
3.4.6 Depth Bounding and Cost Control#
Unbounded HTN decomposition is a risk in LLM agents: the model may recursively decompose tasks into ever-finer subtasks, consuming tokens and latency without converging. The mitigation is a depth bound:
and a cost bound:
where includes inference tokens, tool invocation cost, and latency. The orchestrator enforces these bounds mechanically, triggering escalation or summarization when limits are approached.
3.5 OODA (Observe–Orient–Decide–Act) Loop as Agent Execution Primitive#
3.5.1 Origin and Formal Structure#
The OODA loop was developed by military strategist John Boyd (1976, 1987) as a model of competitive decision-making. Its power lies not in the individual phases but in the tempo of the loop: the agent that cycles through OODA faster than its environment changes can maintain decision advantage.
| Phase | Definition |
|---|---|
| Observe | Gather raw data from the environment: percepts, tool outputs, retrieval results, user messages, system telemetry. |
| Orient | The critical phase. Synthesize observations with prior knowledge, mental models, cultural context, and previous experience to form a situational awareness picture. Orientation shapes what the agent notices and how it interprets information. |
| Decide | Select a course of action from the oriented understanding. This may be explicit (plan selection) or implicit (pattern-matched response). |
| Act | Execute the selected action and generate new environmental effects, which feed back into the next Observe phase. |
| Feedback | Two feedback loops: (1) Act → Observe (external feedback through environment change), and (2) Orient → Observe (internal feedback; orientation biases future observation). |
3.5.2 Boyd's Insight: Orientation as the Schwerpunkt#
Boyd's most important contribution is the recognition that Orient is the central phase. Orientation determines:
- What the agent attends to in observations (selective perception).
- How the agent interprets observations (framing, schema application).
- What options the agent considers (option generation is orientation-dependent).
- How quickly the agent decides (well-oriented agents decide implicitly, without deliberative search).
In agentic AI terms, Orientation is Context Engineering. The quality of the agent's orientation—its compiled context, retrieved evidence, memory summaries, and policy constraints—determines the quality of every downstream decision.
3.5.3 OODA Mapping to the Agentic Agent Loop#
ALGORITHM 3.6: OODA-Agent-Loop
───────────────────────────────
Input: Environment E, agent state A, context budget W
Output: Continuous action stream
1. LOOP (bounded by max_iterations or convergence):
2. // OBSERVE
3. raw ← COLLECT_OBSERVATIONS(E) // tool outputs, user input, telemetry
4. filtered ← FILTER_NOISE(raw, A.attention_policy)
5.
6. // ORIENT
7. queries ← DECOMPOSE_AND_EXPAND(filtered, A.goals)
8. evidence ← HYBRID_RETRIEVE(queries, retrieval_config)
9. memory ← RECALL(A.memory_layers, A.goals)
10. context ← COMPILE_PREFILL(A.policy, evidence, memory, filtered, budget=W)
11. situation ← LLM_ORIENT(context) // situational assessment
12.
13. // DECIDE
14. options ← GENERATE_OPTIONS(situation, A.tool_registry)
15. selected ← EVALUATE_AND_SELECT(options, A.preferences, A.constraints)
16.
17. // ACT
18. result ← EXECUTE(selected, A.tool_runtime)
19. A.history ← APPEND(A.history, (filtered, situation, selected, result))
20.
21. // FEEDBACK
22. IF VERIFY(result, selected.postcondition) = FAIL THEN
23. A.orientation_bias ← UPDATE_BIAS(A.orientation_bias, failure_signal)
24. CONTINUE // rapid re-OODA
25. ELSE
26. COMMIT(result)
27. UPDATE_MEMORY(A.memory_layers, (situation, selected, result))3.5.4 Tempo and Latency Engineering#
Boyd's competitive advantage through tempo translates directly to latency engineering in agentic systems:
For an agent to maintain decision advantage over environmental change, the cycle time must be less than the environment's change rate:
where is the rate of environmentally significant state changes.
| Phase | Dominant Latency Contributors | Optimization Strategies |
|---|---|---|
| Observe | I/O, polling, event subscription | Event-driven observation (webhooks, streams) rather than polling; parallel observation channels. |
| Orient | Retrieval latency, LLM inference | Cache retrieval results; use tiered retrieval (exact match first, semantic only if needed); pre-compute orientation summaries. |
| Decide | LLM inference | Implicit orientation (pattern-matched decisions bypass deliberative inference); pre-computed decision tables for routine cases. |
| Act | Tool invocation, API latency | Async tool execution; parallelized independent actions; pre-staged environments. |
3.5.5 OODA vs. BDI: Complementary, Not Competing#
OODA and BDI are orthogonal frameworks that compose well:
- OODA provides the execution tempo: the continuous loop structure with feedback.
- BDI provides the commitment mechanism: intentions persist across OODA cycles, providing behavioral stability.
- Combined: the agent runs an OODA loop, where Orient corresponds to belief revision, Decide corresponds to intention selection/filtering, and Act corresponds to plan execution. Reconsideration (§3.1) is triggered by Orient detecting significant change.
3.6 Blackboard Architectures for Multi-Source Knowledge Integration#
3.6.1 Formal Structure#
The Blackboard Architecture (Erman et al., 1980; Nii, 1986) was originally developed for the HEARSAY-II speech understanding system. Its core insight is that complex problems requiring heterogeneous knowledge sources are best solved by posting partial solutions to a shared data structure (the blackboard) where independent knowledge sources can contribute incrementally.
| Component | Definition |
|---|---|
| BB (Blackboard) | A structured, shared workspace organized into hierarchical levels (e.g., raw data → features → hypotheses → solutions). Knowledge sources read from and write to the blackboard. |
| (Knowledge Sources) | Independent, specialized modules, each with a trigger condition (pattern that activates the KS when matched on the blackboard) and a contribution function (transformation applied to blackboard contents). Knowledge sources do not communicate directly; all communication is mediated through the blackboard. |
| Control | A scheduling mechanism that determines which knowledge source to activate when multiple are triggered. May be priority-based, focus-based, or opportunistic. |
3.6.2 Blackboard Operation Cycle#
ALGORITHM 3.7: Blackboard-Cycle
────────────────────────────────
Input: Blackboard BB, Knowledge Sources KS[], Control strategy
Output: Updated Blackboard BB'
1. LOOP until SOLUTION_FOUND(BB) or MAX_CYCLES:
2. triggered ← {ks ∈ KS : TRIGGER(ks, BB) = true}
3. IF triggered = ∅ THEN RETURN BB // quiescent state
4. ks* ← CONTROL_SELECT(triggered, BB) // select highest-priority KS
5. contribution ← ks*.EXECUTE(BB)
6. BB ← APPLY(BB, contribution)
7. NOTIFY(KS, BB.changes) // re-evaluate triggers3.6.3 Mapping to Agentic AI: Multi-Source Retrieval and Reasoning#
The blackboard architecture maps elegantly to multi-source knowledge integration in agentic systems:
| Blackboard Concept | Agentic AI Realization |
|---|---|
| Blackboard | A structured shared workspace (e.g., a JSON document, a knowledge graph, or a structured working memory) where partial results from different sources are posted. |
| Knowledge Source: Semantic Search | A specialist that takes a query from the blackboard, executes semantic search against a vector store, and posts provenance-tagged results back. |
| Knowledge Source: Exact Match | A specialist that searches structured databases, code indices, or documentation by exact keyword/ID match. |
| Knowledge Source: Graph Traversal | A specialist that follows relationship edges in a knowledge graph to find related entities and posts lineage-enriched context. |
| Knowledge Source: Code Analysis | A specialist that runs static analysis, AST parsing, or dependency resolution and posts code-derived enrichment. |
| Knowledge Source: Memory Recall | A specialist that queries episodic and semantic memory layers, applying activation-based scoring (§3.2.2). |
| Knowledge Source: Live Inspection | A specialist that queries runtime state, logs, metrics, or browser state and posts current observations. |
| Knowledge Source: LLM Synthesis | A specialist that takes the accumulated blackboard contents and generates a synthesized answer, critique, or plan revision. |
| Control | The context engineering pipeline's ranking and admission control: which knowledge source to invoke next, which results to admit to the context window. |
Hierarchical Blackboard Levels for Agentic Retrieval#
Level 4: SYNTHESIZED RESPONSE [Final answer, verified]
Level 3: HYPOTHESES/PLANS [Candidate answers, plan options]
Level 2: EVIDENCE CHUNKS [Ranked, provenance-tagged retrieval results]
Level 1: PROCESSED OBSERVATIONS [Parsed queries, expanded subqueries, metadata]
Level 0: RAW INPUT [User query, tool outputs, environment signals]Each knowledge source reads from its input level(s) and writes to its output level. This hierarchy ensures that the synthesis agent (Level 4) operates only on processed, ranked, provenance-tagged evidence, not on raw data.
3.6.4 Production Advantages of the Blackboard Pattern#
- Modularity: knowledge sources are independently deployable, testable, and replaceable. Adding a new retrieval source requires only registering a new KS with the blackboard, not modifying the orchestrator.
- Heterogeneity: different KS modules can use different technologies (vector search, SQL, graph queries, API calls) without coupling.
- Incremental refinement: partial solutions are progressively enriched, which aligns with the agent's iterative reasoning pattern.
- Observability: the blackboard is a complete, inspectable record of all intermediate contributions, enabling debugging and audit.
3.6.5 Risks and Mitigations#
| Risk | Mitigation |
|---|---|
| Blackboard contention | Typed write schemas; each KS writes to a designated level with a structured format. Concurrent writes are merge-safe by schema design. |
| Runaway KS activation | Bounded cycle count; each KS has an invocation budget per task. Control enforces monotonic progress (no KS can repeatedly write the same content). |
| Stale data on blackboard | TTL (time-to-live) on blackboard entries; entries older than threshold are evicted or re-validated before use. |
| Token budget overflow | The control component enforces a token budget: only the highest-ranked blackboard entries are admitted to the LLM's context window. |
3.7 Subsumption Architectures for Priority-Based Behavior Arbitration#
3.7.1 Formal Definition#
The Subsumption Architecture (Brooks, 1986) organizes agent behavior as a stack of behavior layers, where each layer implements a complete, independent behavior mapping from percepts to actions. Higher layers subsume (suppress or override) lower layers:
| Component | Definition |
|---|---|
| Behavior layer : a complete percept-to-action mapping. Higher index = higher competence level. | |
| Layer suppresses the input to layer (replaces 's percepts with 's signals). | |
| Layer inhibits the output of layer (blocks 's actions and substitutes 's). |
The key property: lower layers are complete, operational systems that function correctly even if all higher layers are removed. Higher layers add competence without modifying lower layers.
3.7.2 Canonical Layer Stack#
Layer 3: EXPLORE — seek novel states, learn [lowest priority]
Layer 2: BUILD_MAP — construct world model
Layer 1: WANDER — move through environment
Layer 0: AVOID_OBSTACLE — emergency collision avoidance [highest priority]Layer 0 operates independently and correctly. Layer 1 adds wandering behavior but Layer 0 can inhibit Layer 1's motor output when an obstacle is detected. Layer 2 adds map building but does not affect lower layers' safety guarantees.
3.7.3 Mapping to LLM Agent Safety and Priority Architecture#
The subsumption architecture provides the theoretical foundation for layered safety and priority arbitration in agentic AI systems:
| Subsumption Layer | LLM Agent Realization | Priority |
|---|---|---|
| Layer 0: Safety Stop | Hard-coded guardrails: content filters, PII detection, authorization enforcement, rate limiting. Operates without LLM inference. Can inhibit all higher-layer outputs. | Highest (absolute) |
| Layer 1: Policy Enforcement | System prompt constraints, organizational policy rules, normative filters ( from §3.1.3). Suppresses inputs that violate policy. | High |
| Layer 2: Plan Execution | Committed plan step execution: tool calls, API invocations following the current intention. | Medium |
| Layer 3: Deliberative Planning | LLM-driven reasoning, plan generation, goal decomposition. Can be inhibited by safety or policy layers. | Lower |
| Layer 4: Exploratory Reasoning | Speculative reasoning, creative generation, exploratory retrieval. Lowest priority; suppressed when any higher layer is active. | Lowest |
Arbitration Semantics#
The output action at time is determined by:
This is a strict priority scheme: safety always wins. This is the formal justification for why safety guardrails must be implemented as pre-inference, mechanically enforced gates, not as prompt instructions that the LLM might ignore.
3.7.4 Production Implications#
- Safety invariants are independent of model behavior: Layer 0 operates without LLM inference and cannot be circumvented by prompt injection, jailbreaking, or model error.
- Graceful degradation: if the deliberative layer (Layer 3) fails, the agent falls back to plan execution (Layer 2). If plan execution fails, it falls back to policy enforcement (Layer 1). The system always has a safe, operational mode.
- Incremental capability addition: new competence layers can be added without modifying safety or policy layers, reducing regression risk.
- Testability: each layer can be tested independently, including testing that safety layers correctly inhibit lower layers.
3.8 Actor Model and Communicating Sequential Processes (CSP) for Agent Concurrency#
3.8.1 The Actor Model#
Formal Definition#
The Actor Model (Hewitt et al., 1973; Agha, 1986) defines the fundamental unit of concurrent computation as an actor—an entity that:
- Has a private state (not shared with any other actor).
- Processes messages from an unbounded mailbox, one at a time.
- In response to a message, can: (a) send messages to other actors, (b) create new actors, (c) designate a replacement behavior for the next message.
Key Properties#
- Location transparency: actors are addressed by identity, not location; the runtime handles routing.
- No shared state: all communication is via asynchronous message passing; no locks, no shared memory.
- Mailbox semantics: messages are queued; processing is serial within an actor, parallel across actors.
- Supervision hierarchy: actors are organized in trees; parent actors supervise children and handle their failures (let-it-crash philosophy, formalized in Erlang/OTP and Akka).
3.8.2 Communicating Sequential Processes (CSP)#
Formal Definition#
CSP (Hoare, 1978, 1985) models concurrent systems as processes that communicate through synchronous channels:
| Construct | Meaning |
|---|---|
| Prefix: perform event , then behave as . | |
| Parallel composition: and execute concurrently, synchronizing on shared events. | |
| External choice: the environment chooses between and . | |
| Internal (nondeterministic) choice: the process chooses. | |
| Deadlock: no further events. | |
| Successful termination. |
Key Properties#
- Synchronous communication: send and receive on a channel happen simultaneously; the sender blocks until the receiver is ready (rendezvous semantics).
- Compositionality: complex processes are built by composing simpler processes through parallel composition, choice, and sequential composition.
- Formal verification: CSP supports model checking for deadlock freedom, livelock freedom, and refinement checking.
3.8.3 Actor vs. CSP: Trade-Off Analysis for Agent Concurrency#
| Dimension | Actor Model | CSP |
|---|---|---|
| Communication | Asynchronous, unbounded mailbox | Synchronous, channel-based |
| Coupling | Loose; fire-and-forget messages | Tight; rendezvous synchronization |
| Ordering | No guaranteed message ordering (in pure model) | Deterministic ordering through channel discipline |
| Failure handling | Supervision trees (let-it-crash) | Process algebra (refinement, no built-in failure model) |
| Deadlock risk | No deadlock (no blocking) but possible mailbox overflow | Possible deadlock (detectable and verifiable) |
| Suitability | Distributed, loosely coupled multi-agent systems | Tightly coordinated, pipeline-style agent workflows |
3.8.4 Mapping to Multi-Agent Orchestration#
Actor-Based Multi-Agent Architecture#
ALGORITHM 3.8: Actor-Based-Agent-Orchestration
───────────────────────────────────────────────
Actors:
- OrchestratorActor: receives user requests, spawns task actors
- PlannerActor: decomposes tasks into subtasks
- ExecutorActor[i]: executes primitive tasks (tool calls)
- VerifierActor: validates execution results
- MemoryActor: manages memory read/write with validation
- SupervisorActor: monitors child actors, handles failures
Message Flow:
User → OrchestratorActor: {type: "request", payload: task}
OrchestratorActor → PlannerActor: {type: "plan", payload: task}
PlannerActor → OrchestratorActor: {type: "plan_result", payload: subtasks[]}
OrchestratorActor → ExecutorActor[i]: {type: "execute", payload: subtask_i}
ExecutorActor[i] → VerifierActor: {type: "verify", payload: result_i}
VerifierActor → OrchestratorActor: {type: "verified" | "failed", payload: ...}
Failure Handling:
IF ExecutorActor[i] crashes THEN
SupervisorActor receives EXIT signal
SupervisorActor restarts ExecutorActor[i] with last checkpoint
SupervisorActor replays failed message
IF restart_count > max_retries THEN
SupervisorActor escalates to OrchestratorActor
OrchestratorActor triggers alternative decomposition or human escalationCSP-Based Pipeline Composition#
For sequential agent workflows (plan → retrieve → act → verify → commit), CSP provides a cleaner model:
Each stage is a CSP process; channels () enforce synchronization points. The pipeline is deadlock-free by construction if every process eventually produces output on its output channel for every input on its input channel (no infinite loops without output).
3.8.5 Production Implications#
| Concern | Actor/CSP Solution |
|---|---|
| Isolation | Each agent actor has private state; no shared mutable state. Memory access is mediated through MemoryActor with validation. |
| Fault tolerance | Supervision trees with bounded restart policies. Crash recovery is the default, not the exception. |
| Backpressure | Bounded mailboxes (actor) or synchronous channels (CSP) provide natural backpressure when downstream agents are overloaded. |
| Observability | Every message is traceable; structured logging at message boundaries provides complete execution traces. |
| Concurrency control | Work units are claimed by actors with explicit task leases; concurrent modification is prevented by design, not by locks. |
| Scalability | Actors can be distributed across nodes; the mailbox abstraction is location-transparent. |
3.9 Stigmergic Coordination: Environment-Mediated Multi-Agent Communication#
3.9.1 Formal Definition#
Stigmergy (Grassé, 1959) is a coordination mechanism in which agents communicate indirectly through modifications to a shared environment rather than through direct message passing.
| Component | Definition |
|---|---|
| Shared environment: a data structure (artifact repository, shared workspace, knowledge base, task board) that all agents can read from and write to. | |
| Agents: autonomous entities that operate independently, without direct awareness of other agents. | |
| Marking: agent modifies the environment (writes an artifact, updates a status, leaves a "pheromone" signal). | |
| Sensing: agent perceives the current state of the environment, including marks left by other agents. | |
| Decay: environmental marks decay over time, preventing stale signals from persisting indefinitely. |
Quantitative Stigmergy#
The strength of a stigmergic signal at location at time is:
where is the weight of mark deposited by agent at time , and is the decay rate. Agents preferentially attend to locations with high , creating positive feedback loops that amplify successful patterns.
3.9.2 Mapping to Multi-Agent Agentic Systems#
| Stigmergic Concept | Agentic AI Realization |
|---|---|
| Shared environment | Shared artifact store (Git repository, shared workspace, task board, blackboard). |
| Marks | Committed code, documentation updates, test results, status annotations, TODO comments, task completions, error reports. |
| Sensing | Agents observe the shared artifact store before acting: reading current code state, checking test results, reviewing task board status. |
| Decay | Artifact staleness scoring: old TODOs lose priority; stale test results trigger re-execution; completed tasks are archived. |
| Positive feedback | Successful patterns (e.g., well-structured code modules) attract more agent attention (more tests, more documentation, more refinement). |
| Negative feedback | Failed patterns (e.g., failing tests, unresolved errors) trigger corrective agent attention. |
Stigmergic Task Board#
ALGORITHM 3.9: Stigmergic-Task-Board-Coordination
──────────────────────────────────────────────────
Shared State: TaskBoard TB (visible to all agents)
Agent_i behavior loop:
1. tasks ← SENSE(TB, status="available")
2. task* ← SELECT(tasks, A_i.competency, priority_score)
3. success ← CLAIM(TB, task*, A_i.id, lease_duration) // atomic CAS
4. IF NOT success THEN GOTO 1 // another agent claimed it
5. result ← EXECUTE(task*, A_i.tools)
6. MARK(TB, task*, result, status="completed" | "failed")
7. IF result.status = "failed" THEN
8. MARK(TB, task*, error_signal, status="needs_repair")
9. // Decay will eventually re-surface this for another agent
10. GOTO 13.9.3 Advantages for Large-Scale Multi-Agent Systems#
- Decoupling: agents need no knowledge of each other; coordination emerges from environment state.
- Scalability: adding agents does not require updating routing tables or communication topologies.
- Robustness: agent failure does not break coordination; unclaimed tasks remain on the board.
- Emergent coordination: complex coordinated behavior arises from simple individual rules, without centralized planning.
3.9.4 Risks and Mitigations#
| Risk | Mitigation |
|---|---|
| Livelock (agents repeatedly pick up and drop tasks) | Lease-based claiming with exponential backoff; monotonic progress requirements. |
| Stale signals | Decay function with configurable ; periodic garbage collection of resolved signals. |
| Duplicate work | Atomic claim operations (compare-and-swap); work unit idempotency. |
| Convergence | Termination detection: a cleanup agent monitors the task board and declares completion when all tasks are resolved and no new marks are being generated. |
3.10 Contract-Net Protocol and Auction-Based Task Allocation#
3.10.1 Contract-Net Protocol (CNP)#
Formal Definition#
The Contract-Net Protocol (Smith, 1980) is a negotiation-based task allocation mechanism for multi-agent systems:
| Phase | Description |
|---|---|
| Task Announcement | The manager broadcasts a task specification to all potential contractors, including task description, deadline, constraints, and evaluation criteria. |
| Bidding | Each contractor evaluates the task against its capabilities, current load, and cost model, and submits a bid (or declines). |
| Awarding | The manager evaluates bids using a scoring function and awards the contract to the best bidder. |
| Execution & Reporting | The winning contractor executes the task and reports results to the manager. |
Bid Evaluation Function#
The manager's bid evaluation function is:
where are weights reflecting the manager's priorities.
3.10.2 CNP Protocol Flow for LLM Agents#
ALGORITHM 3.10: Contract-Net-Agent-Allocation
──────────────────────────────────────────────
Participants: Manager M, Contractor agents C[]
1. // ANNOUNCE
2. task_spec ← M.decompose_and_specify(user_request)
3. FOR EACH c IN C:
4. SEND(c, {type: "cfp", task: task_spec, deadline: T_d}) // call for proposals
5.
6. // BID
7. bids ← ∅
8. WAIT_UNTIL(deadline_bid OR all_responded):
9. FOR EACH response IN incoming:
10. IF response.type = "propose" THEN
11. bids ← bids ∪ {response}
12. // "refuse" responses are discarded
13.
14. // AWARD
15. IF bids = ∅ THEN
16. ESCALATE(task_spec, "no_capable_contractor")
17. ELSE
18. ranked ← SORT(bids, BY score_function, DESCENDING)
19. winner ← HEAD(ranked)
20. SEND(winner.agent, {type: "accept"})
21. FOR EACH loser IN TAIL(ranked):
22. SEND(loser.agent, {type: "reject"})
23.
24. // EXECUTE & REPORT
25. result ← WAIT(winner.agent, timeout=T_exec)
26. IF result.status = "success" THEN
27. RETURN result.payload
28. ELSE
29. // Re-announce to remaining contractors or escalate
30. RETRY with EXCLUDE(winner.agent)3.10.3 Auction-Based Task Allocation#
For competitive multi-agent environments or resource-constrained settings, auction mechanisms generalize CNP:
| Auction Type | Mechanism | Agent Application |
|---|---|---|
| First-price sealed-bid | Each agent submits one bid; highest bidder wins and pays its bid. | Simple task allocation where agents bid based on estimated capability. |
| Vickrey (second-price) | Highest bidder wins but pays the second-highest bid. Incentive-compatible (truthful bidding is dominant strategy). | Preferred when truthful capability reporting is important for system-level optimization. |
| Combinatorial | Agents bid on bundles of tasks; winner determination is an optimization problem. | Multi-task allocation where tasks have synergies (e.g., related code modules). |
| Dutch (descending) | Price starts high and decreases; first agent to accept wins. | Fast allocation under time pressure; agents with lowest cost thresholds win. |
Formal Auction for Task Allocation#
For tasks and agents , the allocation problem is:
subject to:
where is the value (expected quality) of agent performing task , and is agent 's maximum concurrent task capacity.
3.10.4 Production Considerations#
| Concern | Mitigation |
|---|---|
| Communication overhead | Use targeted announcement (only agents with matching capability schemas receive CFPs), not global broadcast. |
| Strategic misrepresentation | Vickrey auction incentivizes truthful bidding; reputation systems penalize agents that consistently over-promise. |
| Latency | Bounded bid windows; timeout-based fallback to direct assignment if insufficient bids received. |
| Failure after award | Contracts include SLAs; failure triggers re-announcement with the failed agent excluded. Compensation actions are specified in the contract. |
| Load balancing | Bids include current load as a factor; overloaded agents bid lower or decline. |
3.11 Formal Petri Net and State Machine Representations of Agent Lifecycles#
3.11.1 Finite State Machines for Agent Lifecycle#
Formal Definition#
An agent's lifecycle can be modeled as a Deterministic Finite Automaton (DFA):
| Symbol | Definition |
|---|---|
| Finite set of agent states: . | |
| Input alphabet: events that trigger state transitions (e.g., , , , , , ). | |
| Transition function: . | |
| Initial state: . | |
| Accepting states: . |
Agent Lifecycle State Machine#
task_received plan_ready evidence_ready
Idle ───────────────→ Planning ──────────→ Retrieving ──────────→ Executing
↑ │
│ │
│ tool_result │
│ ┌──────────────────────────────────┘
│ ↓
│ Verifying
│ │ │
│ verify_pass verify_fail
│ │ │
│ ↓ ↓
│ Committing Repairing
│ │ │
│ done │ repair_success → Verifying
│ │ │ max_retries
│ ↓ │ ↓
│ Completed Failed
│ │
└── retry ───────┘ (if retry_budget > 0)Transition Function (Excerpt)#
| Current State | Event | Next State | Guard Condition |
|---|---|---|---|
| Idle | task_received | Planning | authorization_check = pass |
| Planning | plan_ready | Retrieving | plan.subtask_count ≤ max_depth |
| Retrieving | evidence_ready | Executing | evidence.relevance ≥ threshold |
| Executing | tool_result | Verifying | — |
| Verifying | verify_pass | Committing | — |
| Verifying | verify_fail | Repairing | repair_attempts < max_retries |
| Verifying | verify_fail | Failed | repair_attempts ≥ max_retries |
| Repairing | repair_success | Verifying | — |
| Committing | done | Completed | — |
| Failed | retry | Planning | retry_budget > 0 |
3.11.2 Petri Nets for Concurrent Agent Modeling#
Formal Definition#
A Petri Net is a tuple:
| Symbol | Definition |
|---|---|
| Finite set of places (represent conditions or states). | |
| Finite set of transitions (represent events or actions). | |
| Flow relation: directed arcs between places and transitions. | |
| Weight function: arc weights (default 1). | |
| Initial marking: number of tokens in each place at start. |
A transition is enabled at marking if:
where denotes the input places of . Firing produces a new marking:
Why Petri Nets for Agentic Systems#
Petri nets are strictly more expressive than finite state machines for modeling agentic systems because they naturally represent:
- Concurrency: multiple tokens can be in different places simultaneously, modeling parallel agent execution.
- Synchronization: transitions with multiple input places model synchronization barriers (e.g., "wait for all sub-agent results before synthesis").
- Resource constraints: token counts model resource availability (e.g., available tool invocation budget, concurrent request capacity).
- Conflict/Choice: when multiple transitions are enabled by the same tokens, this models nondeterministic choice (e.g., tool selection).
Agent Execution Petri Net#
Places:
p_idle, p_planning, p_plan_ready, p_retrieving, p_evidence_ready,
p_executing, p_tool_result, p_verifying, p_verified, p_failed,
p_repairing, p_committing, p_completed,
p_token_budget (resource place, initial tokens = max_budget),
p_retry_budget (resource place, initial tokens = max_retries)
Transitions:
t_receive_task: {p_idle} → {p_planning}
t_plan: {p_planning} → {p_plan_ready}
t_retrieve: {p_plan_ready} → {p_retrieving}
t_evidence: {p_retrieving} → {p_evidence_ready}
t_execute: {p_evidence_ready, p_token_budget} → {p_executing}
// consumes a token budget unit
t_result: {p_executing} → {p_tool_result}
t_verify: {p_tool_result} → {p_verifying}
t_pass: {p_verifying} → {p_verified}
t_fail: {p_verifying, p_retry_budget} → {p_repairing}
// consumes a retry token
t_repair: {p_repairing} → {p_tool_result} // loops back to verify
t_exhaust: {p_verifying} → {p_failed}
// fires when p_retry_budget is empty (no retry tokens)
t_commit: {p_verified} → {p_committing}
t_done: {p_committing} → {p_completed}Formal Analysis Properties#
Petri nets enable formal verification of agent lifecycle properties:
| Property | Definition | Agent Interpretation |
|---|---|---|
| Reachability | Can marking be reached from ? | Can the agent reach the Completed state from Idle for a given task? |
| Boundedness | Is the number of tokens in every place bounded? | Does the agent stay within resource limits (token budget, retry budget, concurrent tasks)? |
| Liveness | Can every transition eventually fire from every reachable marking? | Is every agent capability eventually usable? Are there dead-end states? |
| Deadlock freedom | Is there always at least one enabled transition? | Can the agent get stuck in a state with no possible progress? |
| Fairness | Does every continuously enabled transition eventually fire? | Are all agent paths eventually explored (no starvation)? |
3.11.3 Colored Petri Nets for Multi-Agent Modeling#
Colored Petri Nets (CPN) extend standard Petri nets with typed tokens, enabling modeling of multiple agents, task types, and data flows in a single net:
where is a set of color sets (types), assigns a type to each place, assigns guard conditions to transitions, and assigns typed arc expressions.
In multi-agent systems, token colors can encode:
- Agent identity: which agent is in which state.
- Task parameters: task type, priority, deadline.
- Resource type: GPU budget, API rate limit, token budget.
This enables modeling and verification of complex multi-agent orchestration with heterogeneous agents and resources.
3.11.4 Production Application: Agent Lifecycle as Verifiable State Machine#
Every production agent orchestrator should implement its agent lifecycle as an explicit, verifiable state machine (or Petri net), not as implicit control flow in code. Benefits:
- Invariant enforcement: the state machine definition mechanically prevents illegal transitions (e.g., executing without planning, committing without verifying).
- Observability: current state is always inspectable; state transitions are logged with timestamps and payloads.
- Recovery: after crash, the agent resumes from its last committed state, not from the beginning.
- Testing: state machine coverage analysis ensures all states and transitions are exercised in testing.
- Formal verification: for critical applications, model checking can verify deadlock freedom, boundedness, and liveness.
3.12 Category-Theoretic Composition of Agent Pipelines#
3.12.1 Motivation: Compositionality as Engineering Discipline#
Agent systems are assembled from components: retrievers, planners, executors, verifiers, memory managers, synthesizers. The central question is: when can components be composed, and does the composite inherit the properties of its parts?
Category theory provides the most general framework for answering this question. It abstracts away implementation details and focuses on composition structure: what composes with what, and what properties are preserved.
3.12.2 Category-Theoretic Foundations#
Categories#
A category consists of:
- Objects: — e.g., types, state spaces, data schemas.
- Morphisms: for each pair of objects , a set of arrows from to — e.g., typed functions, agent transformations, pipeline stages.
- Composition: for morphisms and , a composite .
- Identity: for each object , an identity morphism .
- Associativity: .
- Identity law: .
The Category of Agent Components#
Define with:
- Objects: typed data schemas (e.g., , , , , , ).
- Morphisms: agent components as typed transformations:
The pipeline is the composite morphism:
Associativity guarantees that we can group and refactor pipeline stages without changing semantics:
3.12.3 Functors: Structure-Preserving Mappings Between Agent Categories#
A functor maps objects and morphisms from one category to another while preserving composition and identity:
Application: Testing Functor#
Define a testing functor that maps production components to their test doubles:
- (returns canned results)
- (simulates tool calls)
The functor property guarantees that if the production pipeline composes correctly, the test pipeline composes correctly with the same structure. This is the formal basis for compositional testing of agent pipelines.
Application: Observability Functor#
Define an observability functor that wraps each morphism with tracing:
The functor preserves composition: tracing the composite pipeline is equivalent to composing the individually traced stages.
3.12.4 Monoidal Categories: Parallel Composition#
A monoidal category adds a tensor product that models parallel composition:
- and compose in parallel as .
- is the unit object (the "empty" type; doing nothing in parallel changes nothing).
Application: Parallel Agent Execution#
Two independent agent tasks and can be executed in parallel:
The monoidal structure guarantees that:
- Independence: and do not interact (no shared state, isolated workspaces).
- Determinism: the result of is the pair of results, regardless of execution order.
- Composability: is well-defined when types align.
This provides the formal justification for the parallel agent execution model described in the system prompt: "Parallel agents shall operate on independently claimable work units."
3.12.5 Natural Transformations: Component Substitution#
A natural transformation between functors is a family of morphisms such that for every :
This naturality square commutes:
Application: Retriever Substitution#
If and are two retrieval strategies (e.g., semantic search and hybrid search), a natural transformation provides a systematic migration path: for every query type, there exists a transformation from 's result to 's result that commutes with all downstream pipeline stages. This guarantees that replacing a retriever does not break the pipeline's composition structure.
3.12.6 Monads: Effectful Agent Computation#
A monad on a category consists of:
- An endofunctor .
- A unit (return) .
- A multiplication (join/flatten) .
satisfying associativity and unit laws.
Application: Error Handling as Monad#
Define . Every agent component returns instead of , explicitly propagating errors through the pipeline:
Monadic composition (Kleisli composition) automatically handles error short-circuiting:
This provides the formal basis for typed error propagation in agent pipelines, ensuring that failures are handled uniformly and cannot be silently dropped.
Application: Retrieval Context as Monad#
Define . Every retrieval result carries provenance and relevance metadata. Monadic composition ensures that provenance is preserved through all pipeline stages:
flattens nested contextualization (e.g., retrieval of a retrieval result) into a single provenance chain.
3.12.7 String Diagrams: Visual Pipeline Composition#
String diagrams provide a graphical syntax for monoidal categories that makes pipeline composition visually explicit:
Sequential: A ─── f ─── B ─── g ─── C
Parallel: A ─── f ─── B
C ─── g ─── D
Feedback: A ─── f ──┬── B
│
└── (feedback to A via h)String diagrams are sound and complete for monoidal categories: every valid diagram corresponds to a well-typed pipeline, and every well-typed pipeline has a diagram. This provides a formal specification language for agent pipeline architecture.
3.12.8 Composition Properties and Invariant Preservation#
The category-theoretic framework guarantees the following properties when agent pipelines are composed through categorical constructs:
| Property | Categorical Guarantee | Agent Implication |
|---|---|---|
| Type safety | Morphism composition requires matching domain/codomain. | Pipeline stages must have compatible input/output schemas. Detected at design time. |
| Associativity | . | Pipeline refactoring (re-grouping stages) does not change semantics. |
| Identity | . | Inserting a no-op stage does not change behavior; useful for observability wrappers. |
| Functorial testing | . | Testing the composed pipeline is equivalent to composing tested components. |
| Parallel independence | guarantees no interaction. | Formally verifies that parallel agents do not share mutable state. |
| Error propagation | Monadic composition short-circuits on error. | Failures propagate uniformly; no silent error swallowing. |
3.12.9 Practical Typed Interface Design from Category Theory#
Category theory motivates the following production design principles:
- Every agent component is a typed morphism: input schema → output schema, with explicit error types.
- Pipelines are compositions: validated at construction time by type checking, not at runtime by exception handling.
- Parallel execution requires monoidal independence: components in a parallel group must have no shared mutable state. Verify mechanically.
- Component substitution requires natural transformation: replacing a retriever, planner, or verifier requires demonstrating that the new component satisfies the same naturality conditions (same interface contract, commuting with downstream stages).
- Error handling is monadic: every component wraps its output in a Result monad; the pipeline framework provides Kleisli composition for automatic error propagation.
- Observability is functorial: tracing, logging, and metrics are applied as functors that preserve pipeline structure.
Chapter Summary and Architectural Selection Guide#
Synthesis: Architecture Selection Matrix#
| Architecture | Primary Strength | Best Application in Agentic AI | Composition Pattern |
|---|---|---|---|
| BDI (§3.1) | Commitment and practical reasoning | Single-agent goal pursuit with plan stability | Core agent loop structure |
| SOAR (§3.2.1) | Impasse-driven learning | Escalation and workflow compilation | Memory and learning subsystem |
| ACT-R (§3.2.2) | Activation-based retrieval | Retrieval scoring and tool utility learning | Retrieval ranking function |
| GWT (§3.2.3) | Attention and broadcast | Context window engineering | Context admission control |
| Reactive (§3.3.1) | Minimal latency | Safety guardrails, input filters | Subsumption layer 0 |
| Deliberative (§3.3.2) | Plan optimality | Complex multi-step task planning | HTN decomposition |
| Hybrid (§3.3.3) | Balanced capability | Production agent stacks | Layered architecture |
| HTN (§3.4) | Hierarchical decomposition | Task breakdown and workflow templates | Plan library |
| OODA (§3.5) | Execution tempo | Real-time and interactive agent loops | Outer execution loop |
| Blackboard (§3.6) | Multi-source integration | Heterogeneous retrieval aggregation | Retrieval orchestration |
| Subsumption (§3.7) | Priority arbitration | Safety and behavior layering | Control hierarchy |
| Actor/CSP (§3.8) | Concurrent execution | Multi-agent orchestration | Communication substrate |
| Stigmergy (§3.9) | Decoupled coordination | Large-scale multi-agent collaboration | Coordination protocol |
| Contract-Net (§3.10) | Negotiated allocation | Dynamic task distribution | Task allocation protocol |
| Petri Nets/FSM (§3.11) | Formal verification | Agent lifecycle management | State management |
| Category Theory (§3.12) | Compositional guarantees | Pipeline design and refactoring | Composition framework |
The Production Agentic Stack as Architectural Composition#
A production-grade agentic system does not use one architecture; it composes multiple architectures at different layers:
Each architecture provides a specific structural guarantee. Together, they yield a system that is formally grounded, concurrency-safe, verifiable, observable, and compositionally maintainable.
The discipline of selecting, combining, and formally justifying these architectural choices—rather than improvising control flow around prompt templates—is what separates engineering from experimentation in the construction of agentic AI systems.
End of Chapter 3.