Agentic Notes Library

Chapter 3: Formal Agent Architectures — Theoretical Frameworks and Design Patterns

Agent architectures are not implementation accidents; they are structural commitments that determine what an agent can represent, how it reasons, when it acts, and how it fails. Every production agentic system—whether built on large lang...

March 19, 2026 38 min read 8,308 words

Chapter 03Math

Preamble#

Agent architectures are not implementation accidents; they are structural commitments that determine what an agent can represent, how it reasons, when it acts, and how it fails. Every production agentic system—whether built on large language models (LLMs), classical planners, or hybrid stacks—implicitly or explicitly instantiates one or more of the formal architectures catalogued in this chapter. The difference between a brittle demonstration and a reliable, scalable agentic application lies precisely in whether the engineering team has made these architectural commitments explicit, typed, bounded, and verifiable.

This chapter provides an exhaustive formal treatment of twelve foundational agent architectures, each analyzed along five axes:

Formal definition — mathematical objects, state spaces, transition semantics.
Structural decomposition — components, interfaces, data flow.
Mapping to LLM-based agentic systems — how each architecture instantiates concretely in modern context-engineered, tool-augmented, memory-layered agent stacks.
Production implications — reliability, observability, fault tolerance, token efficiency, latency.
Composition properties — how the architecture interacts with other architectures in layered or multi-agent configurations.

The reader should emerge with the capacity to select, combine, and formally justify architectural decisions for any agentic system at enterprise scale.

3.1 BDI (Belief–Desire–Intention) Architecture Applied to LLM Agents#

3.1.1 Philosophical and Formal Foundations#

The Belief–Desire–Intention architecture originates in Michael Bratman's theory of practical reasoning (1987) and was formalized computationally by Rao and Georgeff (1991, 1995). Its central insight is that rational agency requires not only beliefs about the world and desires regarding future states, but a distinct commitment mechanism—intentions—that constrains future deliberation, filters out irrelevant options, and stabilizes behavior over time.

Formal BDI Tuple#

A BDI agent is defined as a tuple:

\mathcal{A}_{BDI} = \langle \mathcal{B}, \mathcal{D}, \mathcal{I}, \mathcal{P}, \delta_B, \delta_D, \delta_I, \pi \rangle

where:

Symbol	Definition
$\mathcal{B}$	Belief base: a set of propositions $\{b_1, b_2, \ldots, b_n\}$ representing the agent's current model of the world, including uncertainty.
$\mathcal{D}$	Desire set: a set of goal states $\{d_1, d_2, \ldots, d_m\}$ the agent finds desirable, not necessarily mutually consistent.
$\mathcal{I}$	Intention stack: an ordered, committed subset of plans currently being pursued, satisfying $\mathcal{I} \subseteq \text{Plans}(\mathcal{D})$ .
$\mathcal{P}$	Plan library: a mapping from triggering events and context conditions to partially ordered sequences of actions and subgoals.
$\delta_B$	Belief revision function: $\delta_B : \mathcal{B} \times \text{Percept} \rightarrow \mathcal{B}'$ .
$\delta_D$	Desire deliberation function: $\delta_D : \mathcal{B} \times \mathcal{D} \rightarrow \mathcal{D}'$ , generating or pruning desires given current beliefs.
$\delta_I$	Intention selection / filtering function: $\delta_I : \mathcal{B} \times \mathcal{D} \times \mathcal{I} \rightarrow \mathcal{I}'$ , committing to or dropping intentions.
$\pi$	Plan selection function: $\pi : \mathcal{B} \times \mathcal{I} \times \mathcal{P} \rightarrow \text{Action}$ , selecting the next executable action.

The BDI Reasoning Cycle#

The canonical BDI reasoning cycle proceeds as follows at each time step $t$ :

ALGORITHM 3.1: BDI-Reasoning-Cycle
────────────────────────────────────
Input: Agent state (B, D, I), percept stream P_stream, plan library PL
Output: Action a_t, updated agent state (B', D', I')
 
1.  percept ← OBSERVE(P_stream)
2.  B' ← BELIEF_REVISE(B, percept)
3.  D' ← DELIBERATE(B', D)                    // generate/filter desires
4.  I_candidates ← FILTER(B', D', I)           // compatible intentions
5.  IF RECONSIDER?(B', I) THEN
6.      I' ← SELECT_INTENTIONS(B', D', I_candidates)
7.  ELSE
8.      I' ← I                                  // commitment persists
9.  plan ← SELECT_PLAN(B', I', PL)
10. a_t ← HEAD(plan)                           // next primitive action
11. EXECUTE(a_t)
12. RETURN (B', D', I'), a_t

The critical design decision is the reconsideration policy at step 5. An agent that reconsiders too frequently is overcautious (high deliberation cost, low throughput). An agent that reconsiders too rarely is overcommitted (brittle under environmental change). This trade-off is formally captured as:

\text{Reconsider}(t) \iff \Delta(\mathcal{B}_t, \mathcal{B}_{t-1}) > \tau_{\text{recon}} \lor \text{IntentionFailure}(\mathcal{I}_t)

where $\tau_{\text{recon}}$ is a tunable belief-change threshold and $\text{IntentionFailure}$ detects plan infeasibility.

3.1.2 Mapping BDI to LLM-Based Agentic Systems#

The BDI framework maps with high fidelity onto modern LLM agent stacks when each BDI component is given a concrete, typed implementation:

BDI Component	LLM Agent Realization
Beliefs $\mathcal{B}$	Context window contents: system prompt policy, retrieved evidence (provenance-tagged), tool output history, environment observations, validated memory summaries. The belief base is the compiled prefill at inference time.
Desires $\mathcal{D}$	User-specified goals, system-level objectives (safety constraints, quality targets), and derived subgoals from task decomposition. Represented as structured goal objects with priority, deadline, and success criteria.
Intentions $\mathcal{I}$	The currently committed plan trace: a sequence of planned tool calls, retrieval operations, and generation steps with explicit ordering and dependency edges. Stored in working memory with rollback metadata.
Plan Library $\mathcal{P}$	A registry of known workflows, tool-chain templates, and decomposition strategies, exposed as typed MCP tool/resource surfaces or as few-shot plan exemplars in the prompt.
Belief Revision $\delta_B$	Context engineering pipeline: retrieval re-ranking, memory refresh, observation integration, contradiction detection, and stale-context pruning—executed before each generation step.
Desire Deliberation $\delta_D$	LLM-driven goal analysis: the model evaluates which user objectives are achievable given current beliefs, generates subgoals, and flags conflicts.
Intention Selection $\delta_I$	Plan critique and commitment: the model (or a separate verifier agent) evaluates candidate plans against feasibility, cost, and risk, then commits to an intention stack with bounded depth.
Plan Selection $\pi$	Tool routing and action selection: given the current intention, the model selects the next tool call or generation action, parameterized by typed schemas from the tool registry.

Belief Revision as Context Engineering#

The most critical BDI function in LLM agents is belief revision. In classical BDI, $\delta_B$ is a logical update. In LLM agents, it is context construction:

\mathcal{B}_{t+1} = \text{Compile}\Big(\underbrace{\text{Policy}}_{\text{role, constraints}} \oplus \underbrace{\text{Retrieve}(q_t)}_{\text{evidence}} \oplus \underbrace{\text{Memory}(s_t)}_{\text{session + episodic}} \oplus \underbrace{\text{Obs}(e_t)}_{\text{tool outputs}} \oplus \underbrace{\text{History}(h_t)}_{\text{pruned}}\Big)

subject to the token budget constraint:

|\mathcal{B}_{t+1}|_{\text{tokens}} \leq W_{\text{max}} - R_{\text{reserved}}

where $W_{\text{max}}$ is the model's context window and $R_{\text{reserved}}$ is the token budget reserved for generation and reasoning.

Commitment and Reconsideration in LLM Agents#

LLM agents face a specific form of the commitment–reconsideration trade-off: each reconsideration step costs a full inference call (latency + cost), but insufficient reconsideration leads to hallucinated plan continuations when the environment has changed. The production-grade strategy is:

Observation-triggered reconsideration: reconsider only when a tool call fails, a verification step detects an error, or retrieved evidence contradicts the current plan.
Bounded intention depth: limit the intention stack to $k$ steps before requiring a verification checkpoint. This prevents runaway plan execution.
Explicit rollback metadata: each intention carries a compensating action or rollback specification, enabling recovery without full replanning.

3.1.3 Extended BDI: Obligations, Norms, and Trust#

Production agents operate under normative constraints (safety policies, authorization boundaries, organizational rules). The extended BDI model adds:

\mathcal{A}_{BDI+} = \langle \mathcal{B}, \mathcal{D}, \mathcal{I}, \mathcal{O}, \mathcal{N}, \mathcal{P}, \ldots \rangle

where $\mathcal{O}$ is a set of obligations (must-do constraints from policy or user approval gates) and $\mathcal{N}$ is a normative filter that blocks intention adoption if it violates safety, authorization, or ethical constraints. In typed protocol terms, $\mathcal{N}$ is enforced as a pre-execution validation gate in the agent loop—a mechanically enforced invariant, not a soft prompt instruction.

3.1.4 Production Implications#

Concern	BDI Mitigation
Hallucination control	Beliefs are grounded in provenance-tagged retrieval and verified tool outputs, not in model-generated assertions. The belief base is explicitly constructed, not implicitly assumed.
Fault tolerance	Intentions carry rollback metadata; plan failure triggers reconsideration rather than silent continuation.
Observability	Each BDI cycle emits a structured trace: beliefs consulted, desires evaluated, intentions adopted/dropped, plan selected, action executed, outcome observed.
Token efficiency	The compiled prefill budget enforces that beliefs are compressed and relevant, preventing context window saturation.
Idempotency	Intentions are logged with execution state; re-entry after crash replays from the last committed checkpoint, not from the beginning.

3.2 Cognitive Architectures: SOAR, ACT-R, Global Workspace Theory — Mappings to Agentic AI#

3.2.1 SOAR (State, Operator, And Result)#

Formal Structure#

SOAR, developed by Laird, Newell, and Rosenbloom (1987), models cognition as a search through a problem space. Its core objects are:

\mathcal{S}_{SOAR} = \langle S, O, \text{WM}, \text{LTM}, \text{Chunking}, \text{Impasse} \rangle

Component	Definition
$S$	State: the current representation of the problem, including goals, subgoals, and working context.
$O$	Operators: transformations applicable to states, selected through preference rules.
$\text{WM}$	Working Memory: a short-term, high-bandwidth store holding the current state, active goals, and operator applications. Analogous to the LLM context window.
$\text{LTM}$	Long-Term Memory: divided into procedural (production rules), semantic (facts), and episodic (event sequences) stores.
$\text{Chunking}$	Chunking mechanism: when an impasse is resolved, the resolution path is compiled into a new production rule and stored in LTM, enabling one-step resolution in the future.
$\text{Impasse}$	Impasse detection: when the system cannot select an operator, apply an operator, or determine the result, a subgoal is automatically created.

The SOAR Decision Cycle#

ALGORITHM 3.2: SOAR-Decision-Cycle
───────────────────────────────────
Input: Working Memory WM, Long-Term Memory LTM
Output: Updated WM, executed actions
 
1.  LOOP:
2.      proposals ← MATCH_PRODUCTIONS(WM, LTM.procedural)
3.      preferences ← EVALUATE_PREFERENCES(proposals)
4.      IF UNIQUE_BEST(preferences) THEN
5.          op ← SELECT(preferences)
6.          WM' ← APPLY(op, WM)
7.          IF IS_EXTERNAL(op) THEN EXECUTE_ACTION(op)
8.      ELSE
9.          impasse_type ← CLASSIFY_IMPASSE(preferences)  // tie, conflict, no-change, rejection
10.         subgoal ← CREATE_SUBGOAL(impasse_type, WM)
11.         PUSH(subgoal, goal_stack)
12.         resolution ← SOAR_DECISION_CYCLE(WM ∪ subgoal, LTM)  // recursive
13.         chunk ← COMPILE_CHUNK(subgoal, resolution)
14.         LTM.procedural ← LTM.procedural ∪ {chunk}
15.     WM ← WM'

Mapping to LLM Agents#

SOAR Concept	Agentic AI Realization
Working Memory	The active context window: compiled prefill + recent tool outputs + current plan state.
Long-Term Procedural Memory	Tool registry schemas, workflow templates, and fine-tuned behavioral patterns encoded in model weights or retrieved few-shot exemplars.
Long-Term Semantic Memory	Organizational knowledge base accessed via hybrid retrieval (semantic + exact + graph).
Long-Term Episodic Memory	Validated episode logs: past task traces with outcomes, stored in episodic memory layer with provenance and expiry.
Impasse → Subgoal	When the LLM cannot determine the next action (ambiguity, missing information, tool failure), the agent loop creates a subgoal: ask the user, retrieve more context, invoke a specialist sub-agent.
Chunking	Workflow compilation: successful multi-step resolutions are abstracted into reusable plan templates and stored in the plan library. In practice, this maps to automated pipeline construction from successful traces.
Operator Proposal	Tool/action candidates proposed by the model given the current context; ranked by a preference function (relevance, cost, latency, safety).
Operator Selection	Tool routing: the orchestrator selects the highest-ranked action, with tie-breaking by policy priority.

Critical Insight: Impasse-Driven Subgoaling as Agent Escalation#

The impasse mechanism is the most valuable SOAR concept for production agents. Rather than allowing the LLM to hallucinate when uncertain, the agent architecture should detect impasses mechanically:

No applicable tool: the model's proposed action does not match any registered tool schema → trigger retrieval subgoal or human escalation.
Ambiguous tool selection: multiple tools are equally ranked → trigger a clarification subgoal or invoke a critic agent.
Verification failure: the output of a tool call does not satisfy the postcondition → trigger a repair subgoal with bounded recursion.

This transforms the failure mode from silent hallucination to explicit escalation, which is the single most important reliability improvement in agentic systems.

3.2.2 ACT-R (Adaptive Control of Thought — Rational)#

Formal Structure#

ACT-R (Anderson, 1993, 2007) models cognition as a modular system with parallel perceptual-motor modules and a serial procedural bottleneck:

\mathcal{S}_{ACT\text{-}R} = \langle \text{Buffers}, \text{Modules}, \text{Productions}, \text{Utility}, \text{Activation} \rangle

Modules include declarative memory, procedural memory, goal, imaginal (working memory), visual, motor, and vocal modules. Each module communicates through a buffer (a single-chunk interface), and the procedural module fires productions that match buffer contents.

Subsymbolic Layer: Activation and Utility#

ACT-R's power lies in its subsymbolic equations. The activation of a declarative memory chunk $i$ is:

A_i = B_i + \sum_j W_j \cdot S_{ji} + \epsilon_i

where:

$B_i = \ln\left(\sum_{k=1}^{n} t_k^{-d}\right)$ is the base-level activation (frequency and recency of access, with decay parameter $d \approx 0.5$ ).
$W_j \cdot S_{ji}$ is the spreading activation from elements $j$ currently in the goal buffer, with associative strength $S_{ji}$ .
$\epsilon_i \sim \text{Logistic}(0, s)$ is stochastic noise.

A retrieval request succeeds if $A_i > \tau$ (retrieval threshold); the retrieval latency is:

T_{\text{retrieval}} = F \cdot e^{-f \cdot A_i}

where $F$ and $f$ are scaling parameters. Higher activation yields faster retrieval.

Production utility determines which production fires when multiple match:

U_p = U_p^{(0)} + \alpha \cdot [R - U_p^{(0)}]

updated by temporal-difference-like reward learning, where $R$ is the reward received and $\alpha$ is the learning rate.

Mapping to LLM Agents#

ACT-R Concept	Agentic AI Realization
Declarative Memory Activation	Retrieval scoring function: rank retrieved chunks by a composite of base-level activation (access frequency/recency), spreading activation (relevance to current goal), and noise (diversity). This directly informs hybrid retrieval ranking.
Base-Level Activation ( $B_i$ )	Memory item freshness and usage frequency scores in the episodic/semantic memory layer. Items with higher $B_i$ are prioritized in retrieval; items below threshold are candidates for eviction.
Spreading Activation	Query-aware re-ranking: the current goal and working context items spread activation to related memory chunks, boosting retrieval of contextually relevant items. This is the theoretical justification for query expansion and contextual re-ranking in RAG pipelines.
Retrieval Threshold $\tau$	Minimum relevance cutoff in the retrieval pipeline: chunks scoring below threshold are excluded from the context window, preventing low-relevance noise.
Production Utility Learning	Tool/action selection learning: track the utility of each tool invocation pattern over time; increase utility for patterns that lead to successful task completion, decrease for failures. This maps to bandit-style tool routing optimization.
Single-Chunk Buffer Interface	Typed buffer abstraction: each module (retrieval, tool execution, memory, goal management) communicates through a single structured message at a time, enforcing serialization at the orchestration layer and preventing unbounded parallelism.

Production Implication: ACT-R-Informed Retrieval Scoring#

The ACT-R activation equation provides a principled scoring function for retrieval in agentic systems:

\text{Score}(chunk_i, query_t, goal_t) = \underbrace{\ln\!\left(\sum_{k} t_k^{-d}\right)}_{\text{recency/frequency}} + \underbrace{\sum_{j \in \text{goal}_t} w_j \cdot \text{sim}(j, chunk_i)}_{\text{goal-directed relevance}} + \underbrace{\lambda \cdot \text{authority}(chunk_i)}_{\text{provenance weight}}

This composite score integrates temporal dynamics, goal-directed attention, and source authority into a single ranking metric that outperforms naive cosine similarity for agentic retrieval.

3.2.3 Global Workspace Theory (GWT)#

Formal Structure#

Global Workspace Theory (Baars, 1988, 2005) models consciousness as a broadcast mechanism: many specialized unconscious processors compete to have their outputs broadcast to the global workspace, making those outputs available to all other processors simultaneously.

\mathcal{S}_{GWT} = \langle \mathcal{W}, \{P_1, \ldots, P_n\}, \text{Compete}, \text{Broadcast}, \text{Attend} \rangle

Component	Definition
$\mathcal{W}$	Global Workspace: a shared, capacity-limited broadcast medium.
$P_i$	Specialist processors: domain-specific modules that operate in parallel on their own inputs.
$\text{Compete}$	Competition for access: processors form coalitions and compete for workspace access based on relevance, urgency, and activation strength.
$\text{Broadcast}$	Global broadcast: the winning coalition's output is broadcast to all processors, enabling integration.
$\text{Attend}$	Attentional selection: a gating mechanism that determines which coalition wins workspace access.

Mapping to Agentic AI: The Context Window as Global Workspace#

The LLM context window is a direct structural analog of the Global Workspace:

Capacity-limited: the context window has a fixed token budget. Not all information can be present simultaneously.
Broadcast medium: whatever is in the context window is available to the model's entire parameter space during inference—functionally, it is broadcast to all "processors" (attention heads, layers).
Competition for access: retrieval results, memory summaries, tool outputs, and user inputs compete for inclusion in the context window. The context engineering pipeline is the attentional selection mechanism.
Integration through broadcast: the model synthesizes information from all context sources simultaneously during forward pass, exactly as GWT predicts.

ALGORITHM 3.3: GWT-Inspired-Context-Assembly
─────────────────────────────────────────────
Input: Candidate context items C = {c_1, ..., c_N}, token budget W
Output: Assembled global workspace GW
 
1.  FOR EACH c_i IN C:
2.      score_i ← ACTIVATION(c_i)   // relevance + urgency + authority + recency
3.  coalitions ← FORM_COALITIONS(C, scores)   // group related items
4.  SORT coalitions BY aggregate_score DESCENDING
5.  GW ← ∅; tokens_used ← 0
6.  FOR EACH coalition IN coalitions:
7.      cost ← TOKEN_COUNT(coalition)
8.      IF tokens_used + cost ≤ W THEN
9.          GW ← GW ∪ COMPRESS(coalition)
10.         tokens_used ← tokens_used + TOKEN_COUNT(COMPRESS(coalition))
11.     ELSE
12.         BREAK   // budget exhausted
13. RETURN GW

Production Implication: Context as Attentional Bottleneck#

GWT explains why context engineering is the most important capability in agentic AI: the context window is the agent's consciousness. If critical information does not make it into the context window, it does not exist for the agent's reasoning. Conversely, irrelevant information in the context window displaces relevant information and degrades reasoning quality (analogous to attentional capture by irrelevant stimuli).

This motivates the engineering principle: the context window must be treated as a scarce, managed resource with explicit admission control, not as an unbounded append-only log.

3.2.4 Comparative Synthesis#

Property	SOAR	ACT-R	GWT	LLM Agent Implication
Memory architecture	WM + 3-type LTM	Modular buffers + declarative/procedural	Global workspace + specialists	Multi-layer memory with typed interfaces
Learning mechanism	Chunking (compile experience)	Utility learning + activation tuning	Coalition strengthening	Trace-to-workflow compilation + retrieval scoring adaptation
Failure handling	Impasse → subgoal	Retrieval failure → default	Competition failure → no broadcast	Mechanical impasse detection → escalation
Attention/Selection	Preference-based	Activation-based	Competition-based	Composite scoring for context admission
Bottleneck	Operator selection	Production firing	Workspace capacity	Context window token budget

3.3 Reactive, Deliberative, and Hybrid Agent Architectures#

3.3.1 Reactive Architectures#

Definition and Principles#

A purely reactive agent maintains no internal world model and no explicit goals. It maps percepts directly to actions through a set of condition–action rules (behaviors):

\text{Action}(t) = \sigma(\text{Percept}(t))

where $\sigma$ is a fixed, stateless mapping function. The canonical reactive architecture is Brooks' Subsumption Architecture (detailed in §3.7), but the principles extend to any system that acts without deliberation.

Properties#

Latency: minimal; no planning or search overhead.
Robustness: high for well-defined environments; no world model to become inconsistent.
Scalability of behavior: poor; complex behavior requires many hand-crafted rules with complex interaction patterns.
Adaptability: low; cannot reason about novel situations not covered by existing rules.

LLM Agent Analog#

A "zero-shot" LLM call with a system prompt and no retrieval, no memory, no tool use, and no verification loop is the closest LLM analog to a reactive agent. It maps input directly to output through the model's learned $\sigma$ (parameter-encoded behaviors). This is the weakest agent architecture and the one most prone to hallucination, because there is no mechanism for grounding, verification, or adaptation.

3.3.2 Deliberative Architectures#

Definition and Principles#

A deliberative agent maintains an explicit symbolic world model and uses search-based planning to select actions:

\text{Action}(t) = \text{Plan}(\text{Model}(t), \text{Goal}(t))

The agent constructs a model of the world, defines a goal state, and uses a planning algorithm (e.g., STRIPS, GraphPlan, HTN) to find a sequence of actions that transforms the current state into the goal state.

Properties#

Latency: high; planning is computationally expensive, especially for large state spaces.
Correctness: high for well-modeled domains; the plan is provably correct with respect to the model.
Brittleness: high sensitivity to model inaccuracies; if the world model is wrong, the plan is wrong.
Adaptability: replanning is possible but expensive.

LLM Agent Analog#

An LLM agent that performs explicit multi-step planning before acting—decomposing a user request into a structured task graph, selecting tools for each step, and executing sequentially with verification—is a deliberative agent. The "world model" is the agent's compiled context (beliefs), and the "planning algorithm" is the LLM's chain-of-thought reasoning constrained by the plan library.

3.3.3 Hybrid Architectures#

Definition and Layered Structure#

Hybrid architectures combine reactive and deliberative layers, typically in a layered configuration:

\mathcal{A}_{hybrid} = \langle L_{\text{reactive}}, L_{\text{planning}}, L_{\text{cooperative}}, \text{Control} \rangle

The canonical examples are:

InteRRaP (Müller, 1996): three layers (behavior, plan, cooperation) with bottom-up activation and top-down commitment.
TouringMachines (Ferguson, 1992): three layers (reactive, planning, modeling) with a control framework that mediates between layers.
3T Architecture (Bonasso et al., 1997): three tiers (skills, sequencing, deliberation) with each tier operating at a different temporal granularity.

The Layered Control Pattern#

ALGORITHM 3.4: Hybrid-Layered-Agent
────────────────────────────────────
Input: Percept p, current agent state S
Output: Action a
 
1.  // Layer 0: Safety / Reactive (highest priority, lowest latency)
2.  IF SAFETY_VIOLATION(p, S) THEN
3.      RETURN SAFETY_ACTION(p)          // immediate, no deliberation
4.
5.  // Layer 1: Procedural / Skill Execution
6.  IF ACTIVE_PLAN(S) AND NEXT_STEP_APPLICABLE(S, p) THEN
7.      RETURN EXECUTE_NEXT_STEP(S)      // continue committed plan
8.
9.  // Layer 2: Deliberative / Planning
10. IF GOAL_UNACHIEVED(S) THEN
11.     plan ← REPLAN(S.model, S.goal, p)
12.     S.plan ← plan
13.     RETURN EXECUTE_NEXT_STEP(S)
14.
15. // Layer 3: Cooperative / Meta-reasoning
16. IF COORDINATION_NEEDED(S) THEN
17.     RETURN COORDINATE(S, peers)
18.
19. RETURN IDLE_ACTION()

Mapping to Production LLM Agent Stacks#

The hybrid layered architecture is the dominant pattern in production agentic systems:

Layer	LLM Agent Realization	Temporal Granularity
Reactive / Safety	Input/output guardrails, content filters, authorization checks, rate limiters. No LLM inference required; executed as typed pre/post-conditions.	Microseconds to milliseconds
Procedural / Skill	Tool execution, API calls, file operations—following a committed plan step. Tool routing based on typed schemas.	Milliseconds to seconds
Deliberative / Planning	LLM-driven task decomposition, plan generation, retrieval-augmented reasoning. Full inference cycle.	Seconds to minutes
Cooperative / Meta	Multi-agent coordination, human escalation, cross-agent memory sharing, approval gates.	Minutes to hours

The control framework mediates between layers through priority-based arbitration: safety always preempts planning; planning preempts idle exploration; ongoing skill execution preempts replanning unless a verification failure or impasse is detected.

3.3.4 Trade-Off Analysis#

\text{Architecture Utility} = f(\text{reactivity}, \text{optimality}, \text{adaptability}, \text{complexity})

Architecture	Reactivity	Optimality	Adaptability	Implementation Complexity
Reactive	★★★★★	★☆☆☆☆	★☆☆☆☆	★★☆☆☆
Deliberative	★☆☆☆☆	★★★★★	★★★☆☆	★★★★☆
Hybrid	★★★★☆	★★★★☆	★★★★☆	★★★★★

The hybrid architecture is strictly preferred for production agentic systems because it provides safety guarantees (reactive layer), plan quality (deliberative layer), and coordination capacity (cooperative layer) simultaneously. The cost is implementation complexity, which is managed through typed layer interfaces and explicit control policies.

3.4 Hierarchical Task Networks (HTN) for Complex Plan Decomposition#

3.4.1 Formal Definition#

A Hierarchical Task Network planning domain is defined as:

\mathcal{H} = \langle \mathcal{T}_p, \mathcal{T}_c, \mathcal{M}, \mathcal{O}, s_0 \rangle

Symbol	Definition
$\mathcal{T}_p$	Primitive tasks: directly executable actions (operators).
$\mathcal{T}_c$	Compound (abstract) tasks: tasks that must be decomposed further.
$\mathcal{M}$	Methods: decomposition rules $m = (t, \text{pre}(m), \text{subtasks}(m))$ , where $t \in \mathcal{T}_c$ is the task being decomposed, $\text{pre}(m)$ are preconditions, and $\text{subtasks}(m)$ is an ordered or partially ordered set of subtasks.
$\mathcal{O}$	Operators: state-transition specifications for primitive tasks, $o = (t, \text{pre}(o), \text{eff}(o))$ .
$s_0$	Initial state: the starting world state.

A task network is a tuple $w = (U, C)$ where $U$ is a set of task nodes and $C$ is a set of constraints (ordering, variable binding, state conditions).

HTN Planning as Recursive Decomposition#

HTN planning proceeds by recursively replacing compound tasks with their method decompositions until only primitive tasks remain:

w_0 \xrightarrow{m_1} w_1 \xrightarrow{m_2} \cdots \xrightarrow{m_k} w_k

where $w_0$ contains the initial abstract task and $w_k$ contains only primitive tasks whose execution sequence achieves the goal. The decomposition is valid if:

Every compound task is replaced by a method whose preconditions are satisfied.
The resulting primitive task sequence is executable in the initial state.
All ordering and binding constraints are satisfied.

3.4.2 HTN Decomposition Algorithm#

ALGORITHM 3.5: HTN-Decompose
─────────────────────────────
Input: Task network w = (U, C), methods M, operators O, state s
Output: Executable plan π or FAILURE
 
1.  IF U contains only primitive tasks THEN
2.      π ← LINEARIZE(U, C)                    // respect ordering constraints
3.      IF EXECUTABLE(π, s, O) THEN RETURN π
4.      ELSE RETURN FAILURE
5.  SELECT t ∈ U such that t ∈ T_c             // choose a compound task
6.  applicable ← {m ∈ M : m.task = t AND SATISFIES(m.pre, s)}
7.  IF applicable = ∅ THEN RETURN FAILURE
8.  FOR EACH m ∈ applicable (ordered by heuristic priority):
9.      w' ← REPLACE(w, t, m.subtasks)          // substitute decomposition
10.     C' ← C ∪ m.constraints
11.     result ← HTN_DECOMPOSE(w' = (U', C'), M, O, s)
12.     IF result ≠ FAILURE THEN RETURN result
13. RETURN FAILURE                               // all methods failed → backtrack

3.4.3 Mapping HTN to LLM Agent Task Decomposition#

HTN is the most natural planning formalism for LLM-based agents because LLMs already perform task decomposition as a reasoning pattern (chain-of-thought, tree-of-thought). The mapping is:

HTN Concept	LLM Agent Realization
Compound task	High-level user request or agent-generated subgoal (e.g., "Refactor the authentication module").
Method	A decomposition template: a known workflow for breaking a compound task into subtasks. Stored in the plan library as typed method schemas with preconditions.
Primitive task	A single tool call, API invocation, or atomic generation step with a typed schema (MCP tool surface).
Preconditions	Belief checks: "Is the repository cloned? Is the file accessible? Does the user have authorization?" Evaluated against the agent's current belief base.
Ordering constraints	Dependency edges in the task graph: "Subtask B requires the output of Subtask A." Enforced by the orchestrator.
Decomposition	LLM-driven planning: given a compound task and the current context, the model generates a decomposition. The plan library provides method templates; the LLM selects and adapts.
Backtracking	If a decomposition path fails (tool error, verification failure), the agent backtracks to try an alternative method, bounded by a retry budget.

Typed Method Schema (Pseudo-Schema)#

METHOD DecomposeCodeRefactor:
  task: "refactor_module"
  preconditions:
    - belief("repository_cloned") = true
    - belief("module_exists", module_name) = true
  subtasks:
    1. analyze_module(module_name) → analysis      [primitive, tool: code_analysis]
    2. generate_plan(analysis) → refactor_plan     [primitive, tool: llm_generate]
    3. apply_changes(refactor_plan) → diff          [primitive, tool: code_edit]
    4. run_tests(module_name) → test_result         [primitive, tool: test_runner]
    5. verify_result(test_result, refactor_plan)    [primitive, tool: llm_verify]
  ordering: 1 < 2 < 3 < 4 < 5
  rollback: revert_changes(diff)
  max_retries: 2

3.4.4 Ordered vs. Partially Ordered HTN#

Totally Ordered STN (Simple Task Networks): subtasks have a linear order. Simpler to execute but less parallelizable. Used when task dependencies are strictly sequential.
Partially Ordered STN: subtasks have partial ordering constraints. Enables parallel execution of independent subtasks. Requires a concurrent orchestrator with merge semantics.

For production multi-agent systems, partially ordered HTN is preferred because it enables:

\text{Speedup} = \frac{T_{\text{sequential}}}{\max_{\text{path}} T_{\text{critical\_path}}}

where the critical path is the longest dependency chain. Independent subtasks can be distributed to parallel agents with isolated workspaces.

3.4.5 HTN and Verification#

HTN naturally integrates verification as a decomposition primitive. For every method that produces an artifact, a verification subtask is added:

\text{subtasks}(m) = [\ldots, \text{produce}(artifact), \text{verify}(artifact, spec), \ldots]

If $\text{verify}$ fails, the orchestrator triggers a repair method or backtracks. This is the formal basis for the plan → act → verify → repair loop in agentic systems.

3.4.6 Depth Bounding and Cost Control#

Unbounded HTN decomposition is a risk in LLM agents: the model may recursively decompose tasks into ever-finer subtasks, consuming tokens and latency without converging. The mitigation is a depth bound:

\text{depth}(\text{decomposition}) \leq D_{\max}

and a cost bound:

\sum_{i=1}^{|\pi|} \text{cost}(t_i) \leq C_{\max}

where $\text{cost}(t_i)$ includes inference tokens, tool invocation cost, and latency. The orchestrator enforces these bounds mechanically, triggering escalation or summarization when limits are approached.

3.5 OODA (Observe–Orient–Decide–Act) Loop as Agent Execution Primitive#

3.5.1 Origin and Formal Structure#

The OODA loop was developed by military strategist John Boyd (1976, 1987) as a model of competitive decision-making. Its power lies not in the individual phases but in the tempo of the loop: the agent that cycles through OODA faster than its environment changes can maintain decision advantage.

\mathcal{L}_{OODA} = \langle \text{Observe}, \text{Orient}, \text{Decide}, \text{Act}, \text{Feedback} \rangle

Phase	Definition
Observe	Gather raw data from the environment: percepts, tool outputs, retrieval results, user messages, system telemetry.
Orient	The critical phase. Synthesize observations with prior knowledge, mental models, cultural context, and previous experience to form a situational awareness picture. Orientation shapes what the agent notices and how it interprets information.
Decide	Select a course of action from the oriented understanding. This may be explicit (plan selection) or implicit (pattern-matched response).
Act	Execute the selected action and generate new environmental effects, which feed back into the next Observe phase.
Feedback	Two feedback loops: (1) Act → Observe (external feedback through environment change), and (2) Orient → Observe (internal feedback; orientation biases future observation).

3.5.2 Boyd's Insight: Orientation as the Schwerpunkt#

Boyd's most important contribution is the recognition that Orient is the central phase. Orientation determines:

What the agent attends to in observations (selective perception).
How the agent interprets observations (framing, schema application).
What options the agent considers (option generation is orientation-dependent).
How quickly the agent decides (well-oriented agents decide implicitly, without deliberative search).

In agentic AI terms, Orientation is Context Engineering. The quality of the agent's orientation—its compiled context, retrieved evidence, memory summaries, and policy constraints—determines the quality of every downstream decision.

3.5.3 OODA Mapping to the Agentic Agent Loop#

ALGORITHM 3.6: OODA-Agent-Loop
───────────────────────────────
Input: Environment E, agent state A, context budget W
Output: Continuous action stream
 
1.  LOOP (bounded by max_iterations or convergence):
2.      // OBSERVE
3.      raw ← COLLECT_OBSERVATIONS(E)             // tool outputs, user input, telemetry
4.      filtered ← FILTER_NOISE(raw, A.attention_policy)
5.
6.      // ORIENT
7.      queries ← DECOMPOSE_AND_EXPAND(filtered, A.goals)
8.      evidence ← HYBRID_RETRIEVE(queries, retrieval_config)
9.      memory ← RECALL(A.memory_layers, A.goals)
10.     context ← COMPILE_PREFILL(A.policy, evidence, memory, filtered, budget=W)
11.     situation ← LLM_ORIENT(context)            // situational assessment
12.
13.     // DECIDE
14.     options ← GENERATE_OPTIONS(situation, A.tool_registry)
15.     selected ← EVALUATE_AND_SELECT(options, A.preferences, A.constraints)
16.
17.     // ACT
18.     result ← EXECUTE(selected, A.tool_runtime)
19.     A.history ← APPEND(A.history, (filtered, situation, selected, result))
20.
21.     // FEEDBACK
22.     IF VERIFY(result, selected.postcondition) = FAIL THEN
23.         A.orientation_bias ← UPDATE_BIAS(A.orientation_bias, failure_signal)
24.         CONTINUE                                // rapid re-OODA
25.     ELSE
26.         COMMIT(result)
27.         UPDATE_MEMORY(A.memory_layers, (situation, selected, result))

3.5.4 Tempo and Latency Engineering#

Boyd's competitive advantage through tempo translates directly to latency engineering in agentic systems:

\text{OODA Cycle Time} = T_{\text{observe}} + T_{\text{orient}} + T_{\text{decide}} + T_{\text{act}}

For an agent to maintain decision advantage over environmental change, the cycle time must be less than the environment's change rate:

T_{\text{OODA}} < \frac{1}{\lambda_{\text{env}}}

where $\lambda_{\text{env}}$ is the rate of environmentally significant state changes.

Phase	Dominant Latency Contributors	Optimization Strategies
Observe	I/O, polling, event subscription	Event-driven observation (webhooks, streams) rather than polling; parallel observation channels.
Orient	Retrieval latency, LLM inference	Cache retrieval results; use tiered retrieval (exact match first, semantic only if needed); pre-compute orientation summaries.
Decide	LLM inference	Implicit orientation (pattern-matched decisions bypass deliberative inference); pre-computed decision tables for routine cases.
Act	Tool invocation, API latency	Async tool execution; parallelized independent actions; pre-staged environments.

3.5.5 OODA vs. BDI: Complementary, Not Competing#

OODA and BDI are orthogonal frameworks that compose well:

OODA provides the execution tempo: the continuous loop structure with feedback.
BDI provides the commitment mechanism: intentions persist across OODA cycles, providing behavioral stability.
Combined: the agent runs an OODA loop, where Orient corresponds to belief revision, Decide corresponds to intention selection/filtering, and Act corresponds to plan execution. Reconsideration (§3.1) is triggered by Orient detecting significant change.

3.6 Blackboard Architectures for Multi-Source Knowledge Integration#

3.6.1 Formal Structure#

The Blackboard Architecture (Erman et al., 1980; Nii, 1986) was originally developed for the HEARSAY-II speech understanding system. Its core insight is that complex problems requiring heterogeneous knowledge sources are best solved by posting partial solutions to a shared data structure (the blackboard) where independent knowledge sources can contribute incrementally.

\mathcal{A}_{BB} = \langle \text{BB}, \{KS_1, \ldots, KS_n\}, \text{Control} \rangle

Component	Definition
BB (Blackboard)	A structured, shared workspace organized into hierarchical levels (e.g., raw data → features → hypotheses → solutions). Knowledge sources read from and write to the blackboard.
$KS_i$ (Knowledge Sources)	Independent, specialized modules, each with a trigger condition (pattern that activates the KS when matched on the blackboard) and a contribution function (transformation applied to blackboard contents). Knowledge sources do not communicate directly; all communication is mediated through the blackboard.
Control	A scheduling mechanism that determines which knowledge source to activate when multiple are triggered. May be priority-based, focus-based, or opportunistic.

3.6.2 Blackboard Operation Cycle#

ALGORITHM 3.7: Blackboard-Cycle
────────────────────────────────
Input: Blackboard BB, Knowledge Sources KS[], Control strategy
Output: Updated Blackboard BB'
 
1.  LOOP until SOLUTION_FOUND(BB) or MAX_CYCLES:
2.      triggered ← {ks ∈ KS : TRIGGER(ks, BB) = true}
3.      IF triggered = ∅ THEN RETURN BB    // quiescent state
4.      ks* ← CONTROL_SELECT(triggered, BB) // select highest-priority KS
5.      contribution ← ks*.EXECUTE(BB)
6.      BB ← APPLY(BB, contribution)
7.      NOTIFY(KS, BB.changes)              // re-evaluate triggers

3.6.3 Mapping to Agentic AI: Multi-Source Retrieval and Reasoning#

The blackboard architecture maps elegantly to multi-source knowledge integration in agentic systems:

Blackboard Concept	Agentic AI Realization
Blackboard	A structured shared workspace (e.g., a JSON document, a knowledge graph, or a structured working memory) where partial results from different sources are posted.
Knowledge Source: Semantic Search	A specialist that takes a query from the blackboard, executes semantic search against a vector store, and posts provenance-tagged results back.
Knowledge Source: Exact Match	A specialist that searches structured databases, code indices, or documentation by exact keyword/ID match.
Knowledge Source: Graph Traversal	A specialist that follows relationship edges in a knowledge graph to find related entities and posts lineage-enriched context.
Knowledge Source: Code Analysis	A specialist that runs static analysis, AST parsing, or dependency resolution and posts code-derived enrichment.
Knowledge Source: Memory Recall	A specialist that queries episodic and semantic memory layers, applying activation-based scoring (§3.2.2).
Knowledge Source: Live Inspection	A specialist that queries runtime state, logs, metrics, or browser state and posts current observations.
Knowledge Source: LLM Synthesis	A specialist that takes the accumulated blackboard contents and generates a synthesized answer, critique, or plan revision.
Control	The context engineering pipeline's ranking and admission control: which knowledge source to invoke next, which results to admit to the context window.

Hierarchical Blackboard Levels for Agentic Retrieval#

Level 4: SYNTHESIZED RESPONSE     [Final answer, verified]
Level 3: HYPOTHESES/PLANS         [Candidate answers, plan options]
Level 2: EVIDENCE CHUNKS          [Ranked, provenance-tagged retrieval results]
Level 1: PROCESSED OBSERVATIONS   [Parsed queries, expanded subqueries, metadata]
Level 0: RAW INPUT                [User query, tool outputs, environment signals]

Each knowledge source reads from its input level(s) and writes to its output level. This hierarchy ensures that the synthesis agent (Level 4) operates only on processed, ranked, provenance-tagged evidence, not on raw data.

3.6.4 Production Advantages of the Blackboard Pattern#

Modularity: knowledge sources are independently deployable, testable, and replaceable. Adding a new retrieval source requires only registering a new KS with the blackboard, not modifying the orchestrator.
Heterogeneity: different KS modules can use different technologies (vector search, SQL, graph queries, API calls) without coupling.
Incremental refinement: partial solutions are progressively enriched, which aligns with the agent's iterative reasoning pattern.
Observability: the blackboard is a complete, inspectable record of all intermediate contributions, enabling debugging and audit.

3.6.5 Risks and Mitigations#

Risk	Mitigation
Blackboard contention	Typed write schemas; each KS writes to a designated level with a structured format. Concurrent writes are merge-safe by schema design.
Runaway KS activation	Bounded cycle count; each KS has an invocation budget per task. Control enforces monotonic progress (no KS can repeatedly write the same content).
Stale data on blackboard	TTL (time-to-live) on blackboard entries; entries older than threshold are evicted or re-validated before use.
Token budget overflow	The control component enforces a token budget: only the highest-ranked blackboard entries are admitted to the LLM's context window.

3.7 Subsumption Architectures for Priority-Based Behavior Arbitration#

3.7.1 Formal Definition#

The Subsumption Architecture (Brooks, 1986) organizes agent behavior as a stack of behavior layers, where each layer implements a complete, independent behavior mapping from percepts to actions. Higher layers subsume (suppress or override) lower layers:

\mathcal{A}_{sub} = \langle (L_0, L_1, \ldots, L_k), \text{Suppress}, \text{Inhibit} \rangle

Component	Definition
$L_i$	Behavior layer $i$ : a complete percept-to-action mapping. Higher index = higher competence level.
$\text{Suppress}(L_j, L_i)$	Layer $j$ suppresses the input to layer $i$ (replaces $i$ 's percepts with $j$ 's signals).
$\text{Inhibit}(L_j, L_i)$	Layer $j$ inhibits the output of layer $i$ (blocks $i$ 's actions and substitutes $j$ 's).

The key property: lower layers are complete, operational systems that function correctly even if all higher layers are removed. Higher layers add competence without modifying lower layers.

3.7.2 Canonical Layer Stack#

Layer 3: EXPLORE        — seek novel states, learn    [lowest priority]
Layer 2: BUILD_MAP      — construct world model
Layer 1: WANDER         — move through environment
Layer 0: AVOID_OBSTACLE — emergency collision avoidance [highest priority]

Layer 0 operates independently and correctly. Layer 1 adds wandering behavior but Layer 0 can inhibit Layer 1's motor output when an obstacle is detected. Layer 2 adds map building but does not affect lower layers' safety guarantees.

3.7.3 Mapping to LLM Agent Safety and Priority Architecture#

The subsumption architecture provides the theoretical foundation for layered safety and priority arbitration in agentic AI systems:

Subsumption Layer	LLM Agent Realization	Priority
Layer 0: Safety Stop	Hard-coded guardrails: content filters, PII detection, authorization enforcement, rate limiting. Operates without LLM inference. Can inhibit all higher-layer outputs.	Highest (absolute)
Layer 1: Policy Enforcement	System prompt constraints, organizational policy rules, normative filters ( $\mathcal{N}$ from §3.1.3). Suppresses inputs that violate policy.	High
Layer 2: Plan Execution	Committed plan step execution: tool calls, API invocations following the current intention.	Medium
Layer 3: Deliberative Planning	LLM-driven reasoning, plan generation, goal decomposition. Can be inhibited by safety or policy layers.	Lower
Layer 4: Exploratory Reasoning	Speculative reasoning, creative generation, exploratory retrieval. Lowest priority; suppressed when any higher layer is active.	Lowest

Arbitration Semantics#

The output action at time $t$ is determined by:

a_t = \begin{cases} L_0(p_t) & \text{if } \text{active}(L_0) \\ L_1(p_t) & \text{if } \text{active}(L_1) \land \neg\text{active}(L_0) \\ \vdots \\ L_k(p_t) & \text{if } \text{active}(L_k) \land \neg\text{active}(L_{k-1}) \land \cdots \end{cases}

This is a strict priority scheme: safety always wins. This is the formal justification for why safety guardrails must be implemented as pre-inference, mechanically enforced gates, not as prompt instructions that the LLM might ignore.

3.7.4 Production Implications#

Safety invariants are independent of model behavior: Layer 0 operates without LLM inference and cannot be circumvented by prompt injection, jailbreaking, or model error.
Graceful degradation: if the deliberative layer (Layer 3) fails, the agent falls back to plan execution (Layer 2). If plan execution fails, it falls back to policy enforcement (Layer 1). The system always has a safe, operational mode.
Incremental capability addition: new competence layers can be added without modifying safety or policy layers, reducing regression risk.
Testability: each layer can be tested independently, including testing that safety layers correctly inhibit lower layers.

3.8 Actor Model and Communicating Sequential Processes (CSP) for Agent Concurrency#

3.8.1 The Actor Model#

Formal Definition#

The Actor Model (Hewitt et al., 1973; Agha, 1986) defines the fundamental unit of concurrent computation as an actor—an entity that:

Has a private state (not shared with any other actor).
Processes messages from an unbounded mailbox, one at a time.
In response to a message, can: (a) send messages to other actors, (b) create new actors, (c) designate a replacement behavior for the next message.

\text{Actor} : \text{Message} \times \text{State} \rightarrow (\text{Messages}_{\text{sent}}, \text{Actors}_{\text{created}}, \text{State}')

Key Properties#

Location transparency: actors are addressed by identity, not location; the runtime handles routing.
No shared state: all communication is via asynchronous message passing; no locks, no shared memory.
Mailbox semantics: messages are queued; processing is serial within an actor, parallel across actors.
Supervision hierarchy: actors are organized in trees; parent actors supervise children and handle their failures (let-it-crash philosophy, formalized in Erlang/OTP and Akka).

3.8.2 Communicating Sequential Processes (CSP)#

Formal Definition#

CSP (Hoare, 1978, 1985) models concurrent systems as processes that communicate through synchronous channels:

P = a \rightarrow P' \quad | \quad P \parallel Q \quad | \quad P \square Q \quad | \quad P \sqcap Q \quad | \quad \text{STOP} \quad | \quad \text{SKIP}

Construct	Meaning
$a \rightarrow P$	Prefix: perform event $a$ , then behave as $P$ .
$P \parallel Q$	Parallel composition: $P$ and $Q$ execute concurrently, synchronizing on shared events.
$P \square Q$	External choice: the environment chooses between $P$ and $Q$ .
$P \sqcap Q$	Internal (nondeterministic) choice: the process chooses.
$\text{STOP}$	Deadlock: no further events.
$\text{SKIP}$	Successful termination.

Key Properties#

Synchronous communication: send and receive on a channel happen simultaneously; the sender blocks until the receiver is ready (rendezvous semantics).
Compositionality: complex processes are built by composing simpler processes through parallel composition, choice, and sequential composition.
Formal verification: CSP supports model checking for deadlock freedom, livelock freedom, and refinement checking.

3.8.3 Actor vs. CSP: Trade-Off Analysis for Agent Concurrency#

Dimension	Actor Model	CSP
Communication	Asynchronous, unbounded mailbox	Synchronous, channel-based
Coupling	Loose; fire-and-forget messages	Tight; rendezvous synchronization
Ordering	No guaranteed message ordering (in pure model)	Deterministic ordering through channel discipline
Failure handling	Supervision trees (let-it-crash)	Process algebra (refinement, no built-in failure model)
Deadlock risk	No deadlock (no blocking) but possible mailbox overflow	Possible deadlock (detectable and verifiable)
Suitability	Distributed, loosely coupled multi-agent systems	Tightly coordinated, pipeline-style agent workflows

3.8.4 Mapping to Multi-Agent Orchestration#

Actor-Based Multi-Agent Architecture#

ALGORITHM 3.8: Actor-Based-Agent-Orchestration
───────────────────────────────────────────────
Actors:
  - OrchestratorActor: receives user requests, spawns task actors
  - PlannerActor: decomposes tasks into subtasks
  - ExecutorActor[i]: executes primitive tasks (tool calls)
  - VerifierActor: validates execution results
  - MemoryActor: manages memory read/write with validation
  - SupervisorActor: monitors child actors, handles failures
 
Message Flow:
  User → OrchestratorActor: {type: "request", payload: task}
  OrchestratorActor → PlannerActor: {type: "plan", payload: task}
  PlannerActor → OrchestratorActor: {type: "plan_result", payload: subtasks[]}
  OrchestratorActor → ExecutorActor[i]: {type: "execute", payload: subtask_i}
  ExecutorActor[i] → VerifierActor: {type: "verify", payload: result_i}
  VerifierActor → OrchestratorActor: {type: "verified" | "failed", payload: ...}
  
Failure Handling:
  IF ExecutorActor[i] crashes THEN
    SupervisorActor receives EXIT signal
    SupervisorActor restarts ExecutorActor[i] with last checkpoint
    SupervisorActor replays failed message
  IF restart_count > max_retries THEN
    SupervisorActor escalates to OrchestratorActor
    OrchestratorActor triggers alternative decomposition or human escalation

CSP-Based Pipeline Composition#

For sequential agent workflows (plan → retrieve → act → verify → commit), CSP provides a cleaner model:

\text{AgentPipeline} = \text{Plan} \xrightarrow{ch_1} \text{Retrieve} \xrightarrow{ch_2} \text{Act} \xrightarrow{ch_3} \text{Verify} \xrightarrow{ch_4} \text{Commit}

Each stage is a CSP process; channels ( $ch_i$ ) enforce synchronization points. The pipeline is deadlock-free by construction if every process eventually produces output on its output channel for every input on its input channel (no infinite loops without output).

3.8.5 Production Implications#

Concern	Actor/CSP Solution
Isolation	Each agent actor has private state; no shared mutable state. Memory access is mediated through MemoryActor with validation.
Fault tolerance	Supervision trees with bounded restart policies. Crash recovery is the default, not the exception.
Backpressure	Bounded mailboxes (actor) or synchronous channels (CSP) provide natural backpressure when downstream agents are overloaded.
Observability	Every message is traceable; structured logging at message boundaries provides complete execution traces.
Concurrency control	Work units are claimed by actors with explicit task leases; concurrent modification is prevented by design, not by locks.
Scalability	Actors can be distributed across nodes; the mailbox abstraction is location-transparent.

3.9 Stigmergic Coordination: Environment-Mediated Multi-Agent Communication#

3.9.1 Formal Definition#

Stigmergy (Grassé, 1959) is a coordination mechanism in which agents communicate indirectly through modifications to a shared environment rather than through direct message passing.

\mathcal{C}_{stig} = \langle \mathcal{E}, \{A_1, \ldots, A_n\}, \text{Mark}, \text{Sense}, \text{Decay} \rangle

Component	Definition
$\mathcal{E}$	Shared environment: a data structure (artifact repository, shared workspace, knowledge base, task board) that all agents can read from and write to.
$A_i$	Agents: autonomous entities that operate independently, without direct awareness of other agents.
$\text{Mark}(A_i, \mathcal{E}) \rightarrow \mathcal{E}'$	Marking: agent $A_i$ modifies the environment (writes an artifact, updates a status, leaves a "pheromone" signal).
$\text{Sense}(A_j, \mathcal{E}) \rightarrow p_j$	Sensing: agent $A_j$ perceives the current state of the environment, including marks left by other agents.
$\text{Decay}(\mathcal{E}, t) \rightarrow \mathcal{E}'$	Decay: environmental marks decay over time, preventing stale signals from persisting indefinitely.

Quantitative Stigmergy#

The strength of a stigmergic signal at location $\ell$ at time $t$ is:

\phi(\ell, t) = \sum_{i} \sum_{k} w_k \cdot e^{-\lambda(t - t_k^{(i)})}

where $w_k$ is the weight of mark $k$ deposited by agent $i$ at time $t_k^{(i)}$ , and $\lambda$ is the decay rate. Agents preferentially attend to locations with high $\phi$ , creating positive feedback loops that amplify successful patterns.

3.9.2 Mapping to Multi-Agent Agentic Systems#

Stigmergic Concept	Agentic AI Realization
Shared environment	Shared artifact store (Git repository, shared workspace, task board, blackboard).
Marks	Committed code, documentation updates, test results, status annotations, TODO comments, task completions, error reports.
Sensing	Agents observe the shared artifact store before acting: reading current code state, checking test results, reviewing task board status.
Decay	Artifact staleness scoring: old TODOs lose priority; stale test results trigger re-execution; completed tasks are archived.
Positive feedback	Successful patterns (e.g., well-structured code modules) attract more agent attention (more tests, more documentation, more refinement).
Negative feedback	Failed patterns (e.g., failing tests, unresolved errors) trigger corrective agent attention.

Stigmergic Task Board#

ALGORITHM 3.9: Stigmergic-Task-Board-Coordination
──────────────────────────────────────────────────
Shared State: TaskBoard TB (visible to all agents)
 
Agent_i behavior loop:
1.  tasks ← SENSE(TB, status="available")
2.  task* ← SELECT(tasks, A_i.competency, priority_score)
3.  success ← CLAIM(TB, task*, A_i.id, lease_duration)  // atomic CAS
4.  IF NOT success THEN GOTO 1                           // another agent claimed it
5.  result ← EXECUTE(task*, A_i.tools)
6.  MARK(TB, task*, result, status="completed" | "failed")
7.  IF result.status = "failed" THEN
8.      MARK(TB, task*, error_signal, status="needs_repair")
9.      // Decay will eventually re-surface this for another agent
10. GOTO 1

3.9.3 Advantages for Large-Scale Multi-Agent Systems#

Decoupling: agents need no knowledge of each other; coordination emerges from environment state.
Scalability: adding agents does not require updating routing tables or communication topologies.
Robustness: agent failure does not break coordination; unclaimed tasks remain on the board.
Emergent coordination: complex coordinated behavior arises from simple individual rules, without centralized planning.

3.9.4 Risks and Mitigations#

Risk	Mitigation
Livelock (agents repeatedly pick up and drop tasks)	Lease-based claiming with exponential backoff; monotonic progress requirements.
Stale signals	Decay function with configurable $\lambda$ ; periodic garbage collection of resolved signals.
Duplicate work	Atomic claim operations (compare-and-swap); work unit idempotency.
Convergence	Termination detection: a cleanup agent monitors the task board and declares completion when all tasks are resolved and no new marks are being generated.

3.10 Contract-Net Protocol and Auction-Based Task Allocation#

3.10.1 Contract-Net Protocol (CNP)#

Formal Definition#

The Contract-Net Protocol (Smith, 1980) is a negotiation-based task allocation mechanism for multi-agent systems:

\text{CNP} = \langle \text{Manager}, \{C_1, \ldots, C_n\}, \text{Announce}, \text{Bid}, \text{Award}, \text{Report} \rangle

Phase	Description
Task Announcement	The manager broadcasts a task specification to all potential contractors, including task description, deadline, constraints, and evaluation criteria.
Bidding	Each contractor evaluates the task against its capabilities, current load, and cost model, and submits a bid (or declines).
Awarding	The manager evaluates bids using a scoring function and awards the contract to the best bidder.
Execution & Reporting	The winning contractor executes the task and reports results to the manager.

Bid Evaluation Function#

The manager's bid evaluation function is:

\text{Score}(bid_j) = \alpha \cdot \text{capability}(j) + \beta \cdot \text{cost}^{-1}(j) + \gamma \cdot \text{load}^{-1}(j) + \delta \cdot \text{reliability}(j)

where $\alpha + \beta + \gamma + \delta = 1$ are weights reflecting the manager's priorities.

3.10.2 CNP Protocol Flow for LLM Agents#

ALGORITHM 3.10: Contract-Net-Agent-Allocation
──────────────────────────────────────────────
Participants: Manager M, Contractor agents C[] 
 
1.  // ANNOUNCE
2.  task_spec ← M.decompose_and_specify(user_request)
3.  FOR EACH c IN C:
4.      SEND(c, {type: "cfp", task: task_spec, deadline: T_d})  // call for proposals
5.
6.  // BID
7.  bids ← ∅
8.  WAIT_UNTIL(deadline_bid OR all_responded):
9.      FOR EACH response IN incoming:
10.         IF response.type = "propose" THEN
11.             bids ← bids ∪ {response}
12.         // "refuse" responses are discarded
13.
14. // AWARD
15. IF bids = ∅ THEN
16.     ESCALATE(task_spec, "no_capable_contractor")
17. ELSE
18.     ranked ← SORT(bids, BY score_function, DESCENDING)
19.     winner ← HEAD(ranked)
20.     SEND(winner.agent, {type: "accept"})
21.     FOR EACH loser IN TAIL(ranked):
22.         SEND(loser.agent, {type: "reject"})
23.
24. // EXECUTE & REPORT
25. result ← WAIT(winner.agent, timeout=T_exec)
26. IF result.status = "success" THEN
27.     RETURN result.payload
28. ELSE
29.     // Re-announce to remaining contractors or escalate
30.     RETRY with EXCLUDE(winner.agent)

3.10.3 Auction-Based Task Allocation#

For competitive multi-agent environments or resource-constrained settings, auction mechanisms generalize CNP:

Auction Type	Mechanism	Agent Application
First-price sealed-bid	Each agent submits one bid; highest bidder wins and pays its bid.	Simple task allocation where agents bid based on estimated capability.
Vickrey (second-price)	Highest bidder wins but pays the second-highest bid. Incentive-compatible (truthful bidding is dominant strategy).	Preferred when truthful capability reporting is important for system-level optimization.
Combinatorial	Agents bid on bundles of tasks; winner determination is an optimization problem.	Multi-task allocation where tasks have synergies (e.g., related code modules).
Dutch (descending)	Price starts high and decreases; first agent to accept wins.	Fast allocation under time pressure; agents with lowest cost thresholds win.

Formal Auction for Task Allocation#

For $m$ tasks $\{T_1, \ldots, T_m\}$ and $n$ agents $\{A_1, \ldots, A_n\}$ , the allocation problem is:

\max_{x_{ij}} \sum_{i=1}^{n} \sum_{j=1}^{m} v_{ij} \cdot x_{ij}

subject to:

\sum_{i=1}^{n} x_{ij} = 1 \quad \forall j \quad \text{(each task assigned to exactly one agent)}

\sum_{j=1}^{m} x_{ij} \leq k_i \quad \forall i \quad \text{(agent capacity constraint)}

x_{ij} \in \{0, 1\}

where $v_{ij}$ is the value (expected quality) of agent $i$ performing task $j$ , and $k_i$ is agent $i$ 's maximum concurrent task capacity.

3.10.4 Production Considerations#

Concern	Mitigation
Communication overhead	Use targeted announcement (only agents with matching capability schemas receive CFPs), not global broadcast.
Strategic misrepresentation	Vickrey auction incentivizes truthful bidding; reputation systems penalize agents that consistently over-promise.
Latency	Bounded bid windows; timeout-based fallback to direct assignment if insufficient bids received.
Failure after award	Contracts include SLAs; failure triggers re-announcement with the failed agent excluded. Compensation actions are specified in the contract.
Load balancing	Bids include current load as a factor; overloaded agents bid lower or decline.

3.11 Formal Petri Net and State Machine Representations of Agent Lifecycles#

3.11.1 Finite State Machines for Agent Lifecycle#

Formal Definition#

An agent's lifecycle can be modeled as a Deterministic Finite Automaton (DFA):

\mathcal{M} = \langle Q, \Sigma, \delta, q_0, F \rangle

Symbol	Definition
$Q$	Finite set of agent states: $\{\text{Idle}, \text{Planning}, \text{Retrieving}, \text{Executing}, \text{Verifying}, \text{Repairing}, \text{Committing}, \text{Failed}, \text{Completed}\}$ .
$\Sigma$	Input alphabet: events that trigger state transitions (e.g., $\text{task\_received}$ , $\text{plan\_ready}$ , $\text{tool\_result}$ , $\text{verify\_pass}$ , $\text{verify\_fail}$ , $\text{max\_retries}$ ).
$\delta$	Transition function: $\delta : Q \times \Sigma \rightarrow Q$ .
$q_0$	Initial state: $\text{Idle}$ .
$F$	Accepting states: $\{\text{Completed}\}$ .

Agent Lifecycle State Machine#

       task_received            plan_ready           evidence_ready
Idle ───────────────→ Planning ──────────→ Retrieving ──────────→ Executing
                         ↑                                            │
                         │                                            │
                         │              tool_result                   │
                         │         ┌──────────────────────────────────┘
                         │         ↓
                         │     Verifying
                         │      │       │
                         │  verify_pass  verify_fail
                         │      │       │
                         │      ↓       ↓
                         │  Committing  Repairing
                         │      │       │
                         │   done  │  repair_success → Verifying
                         │      │  │  max_retries
                         │      ↓  │      ↓
                         │  Completed    Failed
                         │               │
                         └── retry ───────┘ (if retry_budget > 0)

Transition Function (Excerpt)#

Current State	Event	Next State	Guard Condition
Idle	task_received	Planning	authorization_check = pass
Planning	plan_ready	Retrieving	plan.subtask_count ≤ max_depth
Retrieving	evidence_ready	Executing	evidence.relevance ≥ threshold
Executing	tool_result	Verifying	—
Verifying	verify_pass	Committing	—
Verifying	verify_fail	Repairing	repair_attempts < max_retries
Verifying	verify_fail	Failed	repair_attempts ≥ max_retries
Repairing	repair_success	Verifying	—
Committing	done	Completed	—
Failed	retry	Planning	retry_budget > 0

3.11.2 Petri Nets for Concurrent Agent Modeling#

Formal Definition#

A Petri Net is a tuple:

\mathcal{N} = \langle P, T, F, W, M_0 \rangle

Symbol	Definition
$P$	Finite set of places (represent conditions or states).
$T$	Finite set of transitions (represent events or actions).
$F \subseteq (P \times T) \cup (T \times P)$	Flow relation: directed arcs between places and transitions.
$W : F \rightarrow \mathbb{N}^+$	Weight function: arc weights (default 1).
$M_0 : P \rightarrow \mathbb{N}$	Initial marking: number of tokens in each place at start.

A transition $t$ is enabled at marking $M$ if:

\forall p \in \bullet t : M(p) \geq W(p, t)

where $\bullet t$ denotes the input places of $t$ . Firing $t$ produces a new marking:

M'(p) = M(p) - W(p, t) + W(t, p) \quad \forall p \in P

Why Petri Nets for Agentic Systems#

Petri nets are strictly more expressive than finite state machines for modeling agentic systems because they naturally represent:

Concurrency: multiple tokens can be in different places simultaneously, modeling parallel agent execution.
Synchronization: transitions with multiple input places model synchronization barriers (e.g., "wait for all sub-agent results before synthesis").
Resource constraints: token counts model resource availability (e.g., available tool invocation budget, concurrent request capacity).
Conflict/Choice: when multiple transitions are enabled by the same tokens, this models nondeterministic choice (e.g., tool selection).

Agent Execution Petri Net#

Places:
  p_idle, p_planning, p_plan_ready, p_retrieving, p_evidence_ready,
  p_executing, p_tool_result, p_verifying, p_verified, p_failed,
  p_repairing, p_committing, p_completed,
  p_token_budget (resource place, initial tokens = max_budget),
  p_retry_budget (resource place, initial tokens = max_retries)
 
Transitions:
  t_receive_task: {p_idle} → {p_planning}
  t_plan: {p_planning} → {p_plan_ready}
  t_retrieve: {p_plan_ready} → {p_retrieving}
  t_evidence: {p_retrieving} → {p_evidence_ready}
  t_execute: {p_evidence_ready, p_token_budget} → {p_executing}  
      // consumes a token budget unit
  t_result: {p_executing} → {p_tool_result}
  t_verify: {p_tool_result} → {p_verifying}
  t_pass: {p_verifying} → {p_verified}
  t_fail: {p_verifying, p_retry_budget} → {p_repairing}  
      // consumes a retry token
  t_repair: {p_repairing} → {p_tool_result}    // loops back to verify
  t_exhaust: {p_verifying} → {p_failed}  
      // fires when p_retry_budget is empty (no retry tokens)
  t_commit: {p_verified} → {p_committing}
  t_done: {p_committing} → {p_completed}

Formal Analysis Properties#

Petri nets enable formal verification of agent lifecycle properties:

Property	Definition	Agent Interpretation
Reachability	Can marking $M$ be reached from $M_0$ ?	Can the agent reach the Completed state from Idle for a given task?
Boundedness	Is the number of tokens in every place bounded?	Does the agent stay within resource limits (token budget, retry budget, concurrent tasks)?
Liveness	Can every transition eventually fire from every reachable marking?	Is every agent capability eventually usable? Are there dead-end states?
Deadlock freedom	Is there always at least one enabled transition?	Can the agent get stuck in a state with no possible progress?
Fairness	Does every continuously enabled transition eventually fire?	Are all agent paths eventually explored (no starvation)?

3.11.3 Colored Petri Nets for Multi-Agent Modeling#

Colored Petri Nets (CPN) extend standard Petri nets with typed tokens, enabling modeling of multiple agents, task types, and data flows in a single net:

\mathcal{N}_{CPN} = \langle P, T, F, \Sigma, C, G, E, M_0 \rangle

where $\Sigma$ is a set of color sets (types), $C : P \rightarrow \Sigma$ assigns a type to each place, $G : T \rightarrow \text{Guard}$ assigns guard conditions to transitions, and $E : F \rightarrow \text{ArcExpression}$ assigns typed arc expressions.

In multi-agent systems, token colors can encode:

Agent identity: which agent is in which state.
Task parameters: task type, priority, deadline.
Resource type: GPU budget, API rate limit, token budget.

This enables modeling and verification of complex multi-agent orchestration with heterogeneous agents and resources.

3.11.4 Production Application: Agent Lifecycle as Verifiable State Machine#

Every production agent orchestrator should implement its agent lifecycle as an explicit, verifiable state machine (or Petri net), not as implicit control flow in code. Benefits:

Invariant enforcement: the state machine definition mechanically prevents illegal transitions (e.g., executing without planning, committing without verifying).
Observability: current state is always inspectable; state transitions are logged with timestamps and payloads.
Recovery: after crash, the agent resumes from its last committed state, not from the beginning.
Testing: state machine coverage analysis ensures all states and transitions are exercised in testing.
Formal verification: for critical applications, model checking can verify deadlock freedom, boundedness, and liveness.

3.12 Category-Theoretic Composition of Agent Pipelines#

3.12.1 Motivation: Compositionality as Engineering Discipline#

Agent systems are assembled from components: retrievers, planners, executors, verifiers, memory managers, synthesizers. The central question is: when can components be composed, and does the composite inherit the properties of its parts?

Category theory provides the most general framework for answering this question. It abstracts away implementation details and focuses on composition structure: what composes with what, and what properties are preserved.

3.12.2 Category-Theoretic Foundations#

Categories#

A category $\mathcal{C}$ consists of:

Objects: $\text{Ob}(\mathcal{C})$ — e.g., types, state spaces, data schemas.
Morphisms: for each pair of objects $A, B$ , a set $\text{Hom}(A, B)$ of arrows from $A$ to $B$ — e.g., typed functions, agent transformations, pipeline stages.
Composition: for morphisms $f : A \rightarrow B$ and $g : B \rightarrow C$ , a composite $g \circ f : A \rightarrow C$ .
Identity: for each object $A$ , an identity morphism $\text{id}_A : A \rightarrow A$ .
Associativity: $h \circ (g \circ f) = (h \circ g) \circ f$ .
Identity law: $f \circ \text{id}_A = f = \text{id}_B \circ f$ .

The Category of Agent Components#

Define $\mathcal{C}_{\text{Agent}}$ with:

Objects: typed data schemas (e.g., $\text{UserQuery}$ , $\text{RetrievalResult}$ , $\text{Plan}$ , $\text{ToolOutput}$ , $\text{VerifiedResult}$ , $\text{Response}$ ).
Morphisms: agent components as typed transformations:
- $\text{retrieve} : \text{UserQuery} \rightarrow \text{RetrievalResult}$
- $\text{plan} : \text{RetrievalResult} \rightarrow \text{Plan}$
- $\text{execute} : \text{Plan} \rightarrow \text{ToolOutput}$
- $\text{verify} : \text{ToolOutput} \rightarrow \text{VerifiedResult}$
- $\text{synthesize} : \text{VerifiedResult} \rightarrow \text{Response}$

The pipeline is the composite morphism:

\text{pipeline} = \text{synthesize} \circ \text{verify} \circ \text{execute} \circ \text{plan} \circ \text{retrieve} : \text{UserQuery} \rightarrow \text{Response}

Associativity guarantees that we can group and refactor pipeline stages without changing semantics:

(\text{synthesize} \circ \text{verify}) \circ (\text{execute} \circ \text{plan} \circ \text{retrieve}) = \text{synthesize} \circ (\text{verify} \circ \text{execute}) \circ (\text{plan} \circ \text{retrieve})

3.12.3 Functors: Structure-Preserving Mappings Between Agent Categories#

A functor $F : \mathcal{C} \rightarrow \mathcal{D}$ maps objects and morphisms from one category to another while preserving composition and identity:

F(g \circ f) = F(g) \circ F(f) \qquad F(\text{id}_A) = \text{id}_{F(A)}

Application: Testing Functor#

Define a testing functor $T : \mathcal{C}_{\text{Prod}} \rightarrow \mathcal{C}_{\text{Test}}$ that maps production components to their test doubles:

$T(\text{retrieve}_{\text{prod}}) = \text{retrieve}_{\text{mock}}$ (returns canned results)
$T(\text{execute}_{\text{prod}}) = \text{execute}_{\text{stub}}$ (simulates tool calls)

The functor property guarantees that if the production pipeline composes correctly, the test pipeline composes correctly with the same structure. This is the formal basis for compositional testing of agent pipelines.

Application: Observability Functor#

Define an observability functor $O : \mathcal{C}_{\text{Agent}} \rightarrow \mathcal{C}_{\text{Traced}}$ that wraps each morphism with tracing:

O(f : A \rightarrow B) = \text{trace}(f) : A \rightarrow (B, \text{TraceSpan})

The functor preserves composition: tracing the composite pipeline is equivalent to composing the individually traced stages.

3.12.4 Monoidal Categories: Parallel Composition#

A monoidal category $(\mathcal{C}, \otimes, I)$ adds a tensor product $\otimes$ that models parallel composition:

$f : A \rightarrow B$ and $g : C \rightarrow D$ compose in parallel as $f \otimes g : A \otimes C \rightarrow B \otimes D$ .
$I$ is the unit object (the "empty" type; doing nothing in parallel changes nothing).

Application: Parallel Agent Execution#

Two independent agent tasks $f$ and $g$ can be executed in parallel:

f \otimes g : \text{Task}_1 \otimes \text{Task}_2 \rightarrow \text{Result}_1 \otimes \text{Result}_2

The monoidal structure guarantees that:

Independence: $f$ and $g$ do not interact (no shared state, isolated workspaces).
Determinism: the result of $f \otimes g$ is the pair of results, regardless of execution order.
Composability: $(f \otimes g) \circ (h \otimes k)$ is well-defined when types align.

This provides the formal justification for the parallel agent execution model described in the system prompt: "Parallel agents shall operate on independently claimable work units."

3.12.5 Natural Transformations: Component Substitution#

A natural transformation $\eta : F \Rightarrow G$ between functors $F, G : \mathcal{C} \rightarrow \mathcal{D}$ is a family of morphisms $\{\eta_A : F(A) \rightarrow G(A)\}_{A \in \mathcal{C}}$ such that for every $f : A \rightarrow B$ :

\eta_B \circ F(f) = G(f) \circ \eta_A

This naturality square commutes:

\begin{array}{ccc} F(A) & \xrightarrow{F(f)} & F(B) \\ \downarrow{\eta_A} & & \downarrow{\eta_B} \\ G(A) & \xrightarrow{G(f)} & G(B) \end{array}

Application: Retriever Substitution#

If $F$ and $G$ are two retrieval strategies (e.g., semantic search and hybrid search), a natural transformation $\eta$ provides a systematic migration path: for every query type, there exists a transformation from $F$ 's result to $G$ 's result that commutes with all downstream pipeline stages. This guarantees that replacing a retriever does not break the pipeline's composition structure.

3.12.6 Monads: Effectful Agent Computation#

A monad $(M, \eta, \mu)$ on a category $\mathcal{C}$ consists of:

An endofunctor $M : \mathcal{C} \rightarrow \mathcal{C}$ .
A unit (return) $\eta_A : A \rightarrow M(A)$ .
A multiplication (join/flatten) $\mu_A : M(M(A)) \rightarrow M(A)$ .

satisfying associativity and unit laws.

Application: Error Handling as Monad#

Define $M(A) = \text{Result}\langle A, \text{Error}\rangle$ . Every agent component returns $M(A)$ instead of $A$ , explicitly propagating errors through the pipeline:

\text{retrieve} : \text{Query} \rightarrow M(\text{Evidence})

\text{plan} : \text{Evidence} \rightarrow M(\text{Plan})

Monadic composition (Kleisli composition) $g \circ_K f$ automatically handles error short-circuiting:

(g \circ_K f)(x) = \begin{cases} g(y) & \text{if } f(x) = \text{Ok}(y) \\ \text{Err}(e) & \text{if } f(x) = \text{Err}(e) \end{cases}

This provides the formal basis for typed error propagation in agent pipelines, ensuring that failures are handled uniformly and cannot be silently dropped.

Application: Retrieval Context as Monad#

Define $M(A) = \text{Contextualized}\langle A, \text{Provenance}, \text{Score}\rangle$ . Every retrieval result carries provenance and relevance metadata. Monadic composition ensures that provenance is preserved through all pipeline stages:

\mu : M(M(A)) \rightarrow M(A)

flattens nested contextualization (e.g., retrieval of a retrieval result) into a single provenance chain.

3.12.7 String Diagrams: Visual Pipeline Composition#

String diagrams provide a graphical syntax for monoidal categories that makes pipeline composition visually explicit:

Sequential:    A ─── f ─── B ─── g ─── C
 
Parallel:      A ─── f ─── B
               C ─── g ─── D
 
Feedback:      A ─── f ──┬── B
                         │
                         └── (feedback to A via h)

String diagrams are sound and complete for monoidal categories: every valid diagram corresponds to a well-typed pipeline, and every well-typed pipeline has a diagram. This provides a formal specification language for agent pipeline architecture.

3.12.8 Composition Properties and Invariant Preservation#

The category-theoretic framework guarantees the following properties when agent pipelines are composed through categorical constructs:

Property	Categorical Guarantee	Agent Implication
Type safety	Morphism composition requires matching domain/codomain.	Pipeline stages must have compatible input/output schemas. Detected at design time.
Associativity	$(h \circ g) \circ f = h \circ (g \circ f)$ .	Pipeline refactoring (re-grouping stages) does not change semantics.
Identity	$f \circ \text{id} = f = \text{id} \circ f$ .	Inserting a no-op stage does not change behavior; useful for observability wrappers.
Functorial testing	$T(g \circ f) = T(g) \circ T(f)$ .	Testing the composed pipeline is equivalent to composing tested components.
Parallel independence	$f \otimes g$ guarantees no interaction.	Formally verifies that parallel agents do not share mutable state.
Error propagation	Monadic composition short-circuits on error.	Failures propagate uniformly; no silent error swallowing.

3.12.9 Practical Typed Interface Design from Category Theory#

Category theory motivates the following production design principles:

Every agent component is a typed morphism: input schema → output schema, with explicit error types.
Pipelines are compositions: validated at construction time by type checking, not at runtime by exception handling.
Parallel execution requires monoidal independence: components in a parallel group must have no shared mutable state. Verify mechanically.
Component substitution requires natural transformation: replacing a retriever, planner, or verifier requires demonstrating that the new component satisfies the same naturality conditions (same interface contract, commuting with downstream stages).
Error handling is monadic: every component wraps its output in a Result monad; the pipeline framework provides Kleisli composition for automatic error propagation.
Observability is functorial: tracing, logging, and metrics are applied as functors that preserve pipeline structure.

Chapter Summary and Architectural Selection Guide#

Synthesis: Architecture Selection Matrix#

Architecture	Primary Strength	Best Application in Agentic AI	Composition Pattern
BDI (§3.1)	Commitment and practical reasoning	Single-agent goal pursuit with plan stability	Core agent loop structure
SOAR (§3.2.1)	Impasse-driven learning	Escalation and workflow compilation	Memory and learning subsystem
ACT-R (§3.2.2)	Activation-based retrieval	Retrieval scoring and tool utility learning	Retrieval ranking function
GWT (§3.2.3)	Attention and broadcast	Context window engineering	Context admission control
Reactive (§3.3.1)	Minimal latency	Safety guardrails, input filters	Subsumption layer 0
Deliberative (§3.3.2)	Plan optimality	Complex multi-step task planning	HTN decomposition
Hybrid (§3.3.3)	Balanced capability	Production agent stacks	Layered architecture
HTN (§3.4)	Hierarchical decomposition	Task breakdown and workflow templates	Plan library
OODA (§3.5)	Execution tempo	Real-time and interactive agent loops	Outer execution loop
Blackboard (§3.6)	Multi-source integration	Heterogeneous retrieval aggregation	Retrieval orchestration
Subsumption (§3.7)	Priority arbitration	Safety and behavior layering	Control hierarchy
Actor/CSP (§3.8)	Concurrent execution	Multi-agent orchestration	Communication substrate
Stigmergy (§3.9)	Decoupled coordination	Large-scale multi-agent collaboration	Coordination protocol
Contract-Net (§3.10)	Negotiated allocation	Dynamic task distribution	Task allocation protocol
Petri Nets/FSM (§3.11)	Formal verification	Agent lifecycle management	State management
Category Theory (§3.12)	Compositional guarantees	Pipeline design and refactoring	Composition framework

The Production Agentic Stack as Architectural Composition#

A production-grade agentic system does not use one architecture; it composes multiple architectures at different layers:

\underbrace{\text{Category-theoretic typed pipelines}}_{\text{Composition framework}} \circ \underbrace{\text{Petri net lifecycle}}_{\text{State management}} \circ \underbrace{\text{Actor-based concurrency}}_{\text{Execution substrate}} \circ \underbrace{\text{OODA outer loop}}_{\text{Tempo control}} \circ \underbrace{\text{Hybrid layered control}}_{\text{Priority arbitration}} \circ \underbrace{\text{BDI core reasoning}}_{\text{Commitment engine}} \circ \underbrace{\text{HTN decomposition}}_{\text{Planning}} \circ \underbrace{\text{Blackboard retrieval}}_{\text{Knowledge integration}}

Each architecture provides a specific structural guarantee. Together, they yield a system that is formally grounded, concurrency-safe, verifiable, observable, and compositionally maintainable.

The discipline of selecting, combining, and formally justifying these architectural choices—rather than improvising control flow around prompt templates—is what separates engineering from experimentation in the construction of agentic AI systems.

End of Chapter 3.