Chapter 11: Memory Hierarchy — Working, Session, Episodic, Semantic, Procedural

11.1 The Memory Wall Thesis: Why Agents Need Hard Boundaries Between Memory Layers#

11.1.1 The Fundamental Problem#

An agentic system that operates without structurally enforced memory boundaries inevitably degrades along three axes simultaneously: correctness (stale or conflicting information poisons reasoning), latency (unbounded context growth inflates inference cost super-linearly), and reliability (uncontrolled state leakage across tasks produces non-reproducible behavior). The Memory Wall Thesis asserts that no amount of prompt engineering, context-window expansion, or retrieval sophistication compensates for the absence of a rigorous, typed, and mechanically enforced separation between memory layers. Memory must be treated as a stratified, governed subsystem—analogous to the register–cache–RAM–disk hierarchy in computer architecture—where each layer has explicit capacity bounds, admission policies, eviction semantics, durability guarantees, and promotion/demotion contracts.

11.1.2 Formal Statement of the Memory Wall Thesis#

Thesis. An agentic runtime that commingles ephemeral reasoning state, session-scoped interaction history, validated experiential records, canonical domain knowledge, and learned procedural skill within a single undifferentiated context buffer will exhibit monotonically increasing failure rates as task complexity, session duration, or agent population grows. Correctness, cost-efficiency, and auditability require that each memory class occupy a distinct storage tier with independently enforced write-admission, read-access, expiry, and isolation policies.

The thesis is motivated by three observable failure modes in production agentic systems:

Failure Mode	Root Cause	Consequence
Context Saturation	Working memory absorbs stale history and domain knowledge simultaneously, exceeding the effective reasoning capacity of the model.	Degraded chain-of-thought quality; hallucination rate increases as token budget is consumed by low-signal context.
Cross-Session Contamination	Information from a prior session leaks into a current session due to shared mutable state.	Privacy violations; incorrect assumptions carried across user boundaries; non-reproducible outputs.
Knowledge Staleness Cascade	Agent-learned "facts" overwrite or shadow authoritative organizational knowledge without version control or conflict resolution.	Systematic drift from ground truth; compounding errors as downstream agents consume corrupted semantic memory.

11.1.3 The Hierarchical Memory Model#

The memory hierarchy for agentic systems comprises five formally distinct layers, ordered by volatility, capacity, access latency, and governance rigor:

\mathcal{M} = \langle \mathcal{M}_W, \mathcal{M}_S, \mathcal{M}_E, \mathcal{M}_K, \mathcal{M}_P \rangle

where:

$\mathcal{M}_W$ : Working Memory — ephemeral scratch space for active reasoning within a single agent step or micro-plan.
$\mathcal{M}_S$ : Session Memory — conversation-scoped state persisted across turns within a bounded interaction session.
$\mathcal{M}_E$ : Episodic Memory — validated, structured records of past agent experiences with full provenance.
$\mathcal{M}_K$ : Semantic Memory — canonical organizational and domain knowledge, curated and version-controlled.
$\mathcal{M}_P$ : Procedural Memory — learned action sequences, tool-usage patterns, and workflow templates compiled from successful execution traces.

Each layer $\mathcal{M}_i$ is characterized by a tuple of operational properties:

\mathcal{M}_i = (C_i, \tau_i, \pi_i^{\text{admit}}, \pi_i^{\text{evict}}, \pi_i^{\text{read}}, \delta_i, \sigma_i)

where $C_i$ is the capacity bound (in tokens or records), $\tau_i$ is the default time-to-live, $\pi_i^{\text{admit}}$ is the write-admission policy, $\pi_i^{\text{evict}}$ is the eviction policy, $\pi_i^{\text{read}}$ is the access-control policy, $\delta_i$ is the durability class (ephemeral, session-durable, persistent), and $\sigma_i$ is the isolation scope (step, session, agent, organization).

Table: Memory Layer Properties

Layer	Volatility	Capacity	Durability	Isolation Scope	Governance
$\mathcal{M}_W$	Highest	Smallest (reserved token window)	Ephemeral (intra-step)	Single agent step	Automatic GC
$\mathcal{M}_S$	High	Moderate (session budget)	Session-durable	Single session	TTL + checkpointing
$\mathcal{M}_E$	Moderate	Large (indexed store)	Persistent	Agent or agent-class	Validated writes
$\mathcal{M}_K$	Low	Very large (knowledge base)	Persistent, versioned	Organization-wide	Curated, approval-gated
$\mathcal{M}_P$	Lowest	Moderate (procedure library)	Persistent, versioned	Agent-class or org	Tested + promoted

11.1.4 The Cost of Violating the Memory Wall#

Without enforced boundaries, the effective reasoning quality of an agent degrades as a function of context pollution. Define the signal density of the active context window $\mathcal{C}$ as:

\rho(\mathcal{C}) = \frac{\sum_{t \in \mathcal{C}} \text{utility}(t, \text{task})}{\lvert \mathcal{C} \rvert}

where $\text{utility}(t, \text{task})$ measures the marginal contribution of token $t$ to the current task objective. When memory layers are commingled, $\rho(\mathcal{C})$ decays as irrelevant historical tokens, stale knowledge, and prior-session artifacts dilute the signal:

\rho_{\text{commingled}} \propto \frac{1}{\lvert \mathcal{M}_W \rvert + \lvert \mathcal{M}_S \rvert + \lvert \mathcal{M}_E \rvert + \lvert \mathcal{M}_K \rvert + \lvert \mathcal{M}_P \rvert}

In contrast, a properly partitioned system loads only task-relevant slices from each layer, maintaining:

\rho_{\text{partitioned}} = \frac{\sum_{i} \sum_{t \in \text{select}_i(\text{task})} \text{utility}(t, \text{task})}{\sum_{i} \lvert \text{select}_i(\text{task}) \rvert}

where $\text{select}_i(\text{task})$ applies the retrieval policy of layer $i$ to extract only high-utility items. The ratio $\rho_{\text{partitioned}} / \rho_{\text{commingled}}$ grows with system scale, explaining why memory-wall violations cause progressively worse failures as agents mature and accumulate state.

11.1.5 Architectural Implications#

The Memory Wall Thesis imposes the following non-negotiable architectural requirements:

Typed interfaces between every memory layer: each layer exposes read, write, query, and evict operations through versioned, schema-described contracts (JSON-RPC at the application boundary; gRPC/Protobuf internally).
Independent storage backends: working memory resides in-process or in a fast key-value store; session memory in a session-scoped store with TTL; episodic and semantic memory in indexed, persistent stores (vector databases, knowledge graphs); procedural memory in a versioned procedure registry.
Explicit promotion paths: data moves between layers only through validated promotion pipelines, never through implicit leakage or shared mutable references.
Budget enforcement: the prefill compiler allocates token budgets per layer before context assembly; no layer may exceed its allocation without explicit overflow handling.
Audit trail: every write, promotion, eviction, and read across all layers is logged with provenance metadata sufficient for post-hoc replay and compliance review.

11.2 Working Memory: Ephemeral Scratch Space for Active Reasoning#

Working memory ( $\mathcal{M}_W$ ) is the innermost, most volatile layer of the memory hierarchy. It holds the transient cognitive state required for the agent to execute a single reasoning step, sub-plan, or tool invocation. Its defining characteristic is ephemerality: contents exist only for the duration of the current micro-task and are discarded or compressed upon step completion.

11.2.1 Capacity Limits and Overflow Strategies#

Capacity Model#

The capacity of working memory is fundamentally constrained by the model's context window $W_{\max}$ (measured in tokens), minus reservations for other memory layers and the output generation budget:

C_W = W_{\max} - B_{\text{system}} - B_S - B_E - B_K - B_P - B_{\text{output}}

where:

$B_{\text{system}}$ : tokens reserved for the system prompt (role policy, protocol bindings, tool schemas).
$B_S$ : tokens reserved for session memory summary.
$B_E$ : tokens reserved for episodic memory retrievals.
$B_K$ : tokens reserved for semantic knowledge retrievals.
$B_P$ : tokens reserved for procedural memory (active procedure templates).
$B_{\text{output}}$ : tokens reserved for the model's generation output.

Example budget allocation for a 128K-token window:

Component	Budget (tokens)	Percentage
System prompt ( $B_{\text{system}}$ )	4,096	3.2%
Session memory ( $B_S$ )	8,192	6.4%
Episodic retrievals ( $B_E$ )	8,192	6.4%
Semantic retrievals ( $B_K$ )	16,384	12.8%
Procedural memory ( $B_P$ )	4,096	3.2%
Output reservation ( $B_{\text{output}}$ )	16,384	12.8%
Working memory ( $C_W$ )	70,656	55.2%

Overflow Strategies#

When the working memory contents approach or exceed $C_W$ , the system must apply one of the following overflow strategies, selected by policy:

Compression: Apply an LLM-based or extractive summarization pass over the current working memory, reducing token count while preserving decision-relevant facts. The compression ratio $r$ is defined as:

r = \frac{\lvert \mathcal{M}_W^{\text{compressed}} \rvert}{\lvert \mathcal{M}_W^{\text{original}} \rvert}, \quad r \in (0, 1)

The target ratio is set per task class; typical values range from $r = 0.3$ (aggressive compression for simple tasks) to $r = 0.7$ (conservative compression for multi-step reasoning).

Externalization: Offload intermediate results, partial computations, or detailed evidence to an external scratch store (key-value store, temporary file, or tool-managed workspace). Working memory retains only a typed reference (URI + schema hash + token-count metadata).
Selective Pruning: Remove working memory items ranked lowest by a recency-weighted utility score:

u(m) = \alpha \cdot \text{relevance}(m, \text{task}) + \beta \cdot \text{recency}(m) + \gamma \cdot \text{dependency}(m)

where $\text{dependency}(m)$ measures how many downstream reasoning steps reference item $m$ . Items with $u(m) < \theta_{\text{prune}}$ are evicted.

Step Decomposition: If the current micro-task genuinely requires more working memory than $C_W$ , the orchestrator decomposes it into smaller sub-steps, each fitting within the budget.

Pseudo-Algorithm 11.1: Working Memory Overflow Handler

PROCEDURE HandleWorkingMemoryOverflow(M_W, C_W, task):
    current_size ← TokenCount(M_W)
    IF current_size ≤ C_W THEN RETURN M_W
 
    // Phase 1: Prune low-utility items
    FOR EACH item m IN M_W:
        m.score ← α * Relevance(m, task) + β * Recency(m) + γ * Dependency(m)
    SORT M_W BY score ASCENDING
    WHILE TokenCount(M_W) > C_W * 0.9 AND M_W.has_prunable_items():
        lowest ← M_W.pop_lowest_score()
        IF lowest.score < θ_prune:
            EvictionLog.record(lowest, reason="low_utility_prune")
        ELSE:
            BREAK
 
    // Phase 2: Compress if still over budget
    IF TokenCount(M_W) > C_W:
        compressed ← CompressSummarize(M_W, target_ratio=0.5)
        EvictionLog.record_compression(original_size=current_size,
                                        compressed_size=TokenCount(compressed))
        M_W ← compressed
 
    // Phase 3: Externalize if still over budget
    IF TokenCount(M_W) > C_W:
        externalizable ← SelectExternalizableSections(M_W)
        FOR EACH section IN externalizable:
            ref ← ExternalScratchStore.write(section)
            M_W.replace(section, ExternalReference(ref))
 
    // Phase 4: Escalate to decomposition if still over budget
    IF TokenCount(M_W) > C_W:
        RAISE TaskDecompositionRequired(task, current_budget=C_W)
 
    RETURN M_W

11.2.2 Working Memory as Context Window Reservation#

Working memory is not merely a conceptual abstraction; it maps directly to a reserved region of the model's context window. The prefill compiler treats working memory as a first-class segment in the assembled prompt, positioned after the system prompt and before the output cursor.

Segment Layout#

The compiled prompt follows a deterministic segment ordering:

┌─────────────────────────────────────┐
│ SEGMENT 0: System Prompt            │  ← Role policy, protocol, constraints
│ SEGMENT 1: Tool Affordances         │  ← Active tool schemas (lazy-loaded)
│ SEGMENT 2: Semantic Memory Slice    │  ← Retrieved domain knowledge
│ SEGMENT 3: Episodic Memory Slice    │  ← Relevant past episodes
│ SEGMENT 4: Session Memory Summary   │  ← Compressed session history
│ SEGMENT 5: Procedural Memory Slice  │  ← Active procedure templates
│ SEGMENT 6: Working Memory           │  ← Current reasoning state ← ACTIVE
│ SEGMENT 7: Current User Turn        │  ← Latest input / instruction
│ SEGMENT 8: [OUTPUT RESERVATION]     │  ← Reserved for generation
└─────────────────────────────────────┘

Each segment is assigned a token budget at compilation time. The working memory segment receives whatever capacity remains after all other segments are allocated, up to its configured maximum $C_W$ . This reservation model ensures that working memory never starves other layers and that other layers never crowd out active reasoning.

Formal Reservation Invariant#

At every compilation step, the following invariant must hold:

\sum_{i=0}^{7} \lvert S_i \rvert + B_{\text{output}} \leq W_{\max}

If the invariant is violated during compilation, the prefill compiler triggers the overflow handler (Pseudo-Algorithm 11.1) on the working memory segment before proceeding.

11.2.3 Garbage Collection and TTL Policies#

Working memory items have the shortest lifecycle in the memory hierarchy. Garbage collection operates at two granularities:

Step-Level GC#

Upon completion of each agent step (a single plan-act-verify cycle), the working memory is fully cleared unless the orchestrator explicitly marks specific items for carry-forward to the next step. The carry-forward set is bounded:

\lvert \text{carry\_forward} \rvert \leq \kappa \cdot C_W, \quad \kappa \in [0.1, 0.3]

This bound prevents gradual accumulation of "zombie" working memory across steps.

Intra-Step TTL#

Within a single step, individual working memory items may be assigned a TTL measured in sub-operations (e.g., tool calls, retrieval rounds). An item with $\text{TTL} = 0$ is evicted at the next sub-operation boundary.

\text{TTL}(m, t) = \text{TTL}(m, t_0) - (t - t_0)

where $t$ is the current sub-operation index and $t_0$ is the insertion time.

Pseudo-Algorithm 11.2: Working Memory Garbage Collection

PROCEDURE GarbageCollectWorkingMemory(M_W, step_completed, carry_forward_set):
    IF step_completed:
        // Full GC: retain only explicitly marked items
        FOR EACH item m IN M_W:
            IF m NOT IN carry_forward_set:
                M_W.remove(m)
                GCLog.record(m, reason="step_boundary_eviction")
        // Enforce carry-forward bound
        IF TokenCount(M_W) > κ * C_W:
            excess ← SelectLowestUtility(M_W, target=κ * C_W)
            FOR EACH item m IN excess:
                M_W.remove(m)
                GCLog.record(m, reason="carry_forward_overflow")
    ELSE:
        // Intra-step TTL sweep
        FOR EACH item m IN M_W:
            m.ttl ← m.ttl - 1
            IF m.ttl ≤ 0:
                M_W.remove(m)
                GCLog.record(m, reason="ttl_expired")
    RETURN M_W

Promotion Before Eviction#

Before any working memory item is evicted, the GC process checks whether the item qualifies for promotion to a higher-durability layer (session memory or episodic memory). Promotion eligibility is evaluated by the cross-layer promotion policy (§11.7). This ensures that genuinely valuable intermediate results are not lost.

11.3 Session Memory: Conversation-Scoped State with Defined Lifecycle#

Session memory ( $\mathcal{M}_S$ ) captures the stateful context of an ongoing interaction between a user (or calling system) and an agent. It persists across multiple turns within a single session but is strictly isolated from other sessions and from the agent's long-term memory layers.

11.3.1 Session Initialization, Checkpointing, and Resumption#

Session Lifecycle State Machine#

Every session follows a formally defined lifecycle:

                 ┌──────────┐
     create()    │          │   checkpoint()
  ───────────►   │  ACTIVE  │ ──────────────► CHECKPOINTED
                 │          │                     │
                 └────┬─────┘                     │ resume()
                      │                           ▼
                      │ expire() / close()    ┌──────────┐
                      ▼                       │  ACTIVE   │
                 ┌──────────┐                 │(resumed)  │
                 │  CLOSED  │                 └───────────┘
                 └──────────┘
                      │
                      │ archive()
                      ▼
                 ┌──────────┐
                 │ ARCHIVED │  (episodic promotion candidate)
                 └──────────┘

Session Initialization#

Session creation requires the following typed contract:

\text{Session} = \langle \text{sid}, \text{uid}, \text{agent\_id}, t_{\text{create}}, \tau_{\text{max}}, C_S, \text{context\_snapshot}, \text{policy} \rangle

where:

$\text{sid}$ : globally unique session identifier (UUIDv7 for temporal ordering).
$\text{uid}$ : user or caller identity, governing access-control policies.
$\text{agent\_id}$ : the agent instance bound to this session.
$t_{\text{create}}$ : creation timestamp.
$\tau_{\text{max}}$ : maximum session duration (hard TTL).
$C_S$ : token budget for session memory within the context window.
$\text{context\_snapshot}$ : initial state (user preferences, task description, prior context if resuming).
$\text{policy}$ : session-level governance rules (data retention class, PII handling, etc.).

Checkpointing#

Checkpoints capture the session state at a point in time, enabling resumption after interruption (network failure, user departure, agent restart). A checkpoint is a serialized snapshot:

\text{Checkpoint}_k = \langle \text{sid}, k, t_k, \text{summary}_k, \text{turn\_log}_k, \text{working\_state}_k, \text{metadata}_k \rangle

where $k$ is the checkpoint sequence number and $\text{summary}_k$ is a compressed representation of the session history up to turn $k$ .

Pseudo-Algorithm 11.3: Session Checkpointing

PROCEDURE CheckpointSession(session, trigger):
    // trigger ∈ {periodic, turn_count_threshold, user_request, error_recovery}
    k ← session.checkpoint_count + 1
    summary ← SummarizeSessionHistory(
        session.turn_log,
        max_tokens=session.C_S * 0.5,
        preserve_decisions=TRUE,
        preserve_user_corrections=TRUE
    )
    checkpoint ← Checkpoint(
        sid=session.sid,
        sequence=k,
        timestamp=now(),
        summary=summary,
        turn_log=session.turn_log_since_last_checkpoint(),
        working_state=SerializeWorkingMemory(session.working_memory),
        metadata={
            "turn_count": session.turn_count,
            "token_usage": session.total_tokens_consumed,
            "trigger": trigger
        }
    )
    SessionStore.write(checkpoint)
    session.checkpoint_count ← k
    session.last_checkpoint_time ← now()
    RETURN checkpoint

Resumption#

Session resumption reconstructs the active session state from the most recent checkpoint:

Load the latest checkpoint $\text{Checkpoint}_k$ from the session store.
Reconstruct working memory from $\text{working\_state}_k$ .
Inject $\text{summary}_k$ into the session memory segment of the prefill.
Optionally replay the turn log since checkpoint for full fidelity.
Validate that no cross-session contamination has occurred during the idle period.

11.3.2 Session Isolation: Cross-Session Contamination Prevention#

Session isolation is a security-critical property. Contamination occurs when information from session $A$ becomes accessible within session $B$ , violating confidentiality and correctness guarantees.

Isolation Mechanisms#

Namespace partitioning: All session memory reads and writes are scoped to the session identifier. The memory store enforces $\text{sid}$ -based partitioning at the storage layer, not merely at the application layer.
Agent instance isolation: Each session binds to an independent agent runtime instance (or, at minimum, a logically isolated execution context). Shared-nothing semantics apply: no mutable state is shared between sessions.
Context window hygiene: The prefill compiler must verify that no tokens from session $A$ appear in the compiled context for session $B$ . This is enforced by a provenance check on every context segment:

\forall t \in \mathcal{C}_B: \text{provenance}(t).\text{sid} \in \{\text{sid}_B, \bot\}

where $\bot$ denotes session-independent content (system prompts, tool schemas, semantic knowledge).

Shared resource access control: If sessions share access to common tools or external resources, invocations are scoped with caller credentials and session-bound authorization tokens, preventing one session from observing another's tool invocation results.

Formal Isolation Invariant#

\forall \text{sid}_A \neq \text{sid}_B: \mathcal{M}_S(\text{sid}_A) \cap \mathcal{M}_S(\text{sid}_B) = \emptyset

This invariant is enforced at the storage layer through partition keys and at the runtime layer through context compilation validation.

11.3.3 Session Summarization for Long-Running Interactions#

As sessions extend beyond dozens of turns, the raw turn log exceeds the session memory budget $C_S$ . Summarization compresses the history while preserving decision-critical information.

Hierarchical Summarization Strategy#

The session history is summarized at multiple granularities:

Turn-level: Each turn is compressed to its essential content (user intent, agent action, outcome).
Segment-level: Groups of related turns (e.g., a sub-task completion) are merged into a segment summary.
Session-level: The entire session is represented by a running summary that is updated incrementally.

The running summary $\mathcal{S}_k$ after turn $k$ is computed as:

\mathcal{S}_k = \text{Summarize}(\mathcal{S}_{k-1} \oplus \text{Turn}_k, \text{budget} = C_S)

where $\oplus$ denotes concatenation and $\text{Summarize}(\cdot)$ is a compression function that preserves:

User corrections: Any explicit correction by the user is retained verbatim or near-verbatim, as these represent high-value signal for avoiding repeated errors.
Decision points: Key decisions made during the session, including alternatives considered and reasons for selection.
Unresolved items: Open questions, pending actions, or ambiguities that affect future turns.
Accumulated constraints: Constraints, preferences, or filters established during the session.

Information-Theoretic Quality Metric#

Define the summarization quality as the mutual information between the full turn log and the summary, normalized by the summary length:

Q(\mathcal{S}_k) = \frac{I(\text{TurnLog}_{1:k}; \mathcal{S}_k)}{|\mathcal{S}_k|}

The summarization procedure maximizes $Q(\mathcal{S}_k)$ subject to the constraint $|\mathcal{S}_k| \leq C_S$ . In practice, this is approximated by training or prompting the summarizer to prioritize high-information-density items (corrections, decisions, constraints) over low-density items (pleasantries, acknowledgments, repeated context).

Pseudo-Algorithm 11.4: Incremental Session Summarization

PROCEDURE UpdateSessionSummary(session, new_turn):
    session.turn_log.append(new_turn)
    session.turn_count ← session.turn_count + 1
 
    // Check if summarization is needed
    projected_size ← TokenCount(session.summary) + TokenCount(new_turn)
    IF projected_size ≤ session.C_S:
        session.summary ← session.summary ⊕ CompressTurn(new_turn)
        RETURN
 
    // Hierarchical re-summarization
    // Step 1: Identify segment boundaries in recent turns
    segments ← IdentifySegmentBoundaries(
        session.turn_log,
        since=session.last_summarization_turn
    )
 
    // Step 2: Summarize each segment
    segment_summaries ← []
    FOR EACH segment IN segments:
        seg_summary ← SummarizeSegment(segment,
            preserve=["user_corrections", "decisions", "constraints", "open_items"])
        segment_summaries.append(seg_summary)
 
    // Step 3: Merge with existing session summary
    candidate ← MergeSummaries(session.summary, segment_summaries,
        budget=session.C_S,
        priority_order=["corrections", "constraints", "decisions",
                        "outcomes", "context"])
 
    // Step 4: Validate summary quality
    IF SummaryQualityCheck(candidate, session.turn_log):
        session.summary ← candidate
        session.last_summarization_turn ← session.turn_count
    ELSE:
        // Fallback: aggressive compression of oldest segments
        session.summary ← AggressiveCompress(session.summary,
            target_ratio=0.5)
        session.summary ← session.summary ⊕ segment_summaries
        Truncate(session.summary, session.C_S)

11.4 Episodic Memory: Validated Records of Past Agent Experiences#

Episodic memory ( $\mathcal{M}_E$ ) stores structured records of completed agent experiences—resolved tasks, encountered errors, successful strategies, and observed outcomes. Unlike session memory (which is scoped to a single interaction and eventually archived), episodic memory is a persistent, indexed, queryable store that enables the agent to learn from its own history.

11.4.1 Episode Schema: Trigger, Context, Action, Outcome, Evaluation, Timestamp#

Every episode is stored as a typed record conforming to a strict schema:

\text{Episode} = \langle \text{eid}, \text{trigger}, \text{context}, \text{actions}, \text{outcome}, \text{evaluation}, \text{metadata} \rangle

Field Definitions#

Field	Type	Description
`eid`	`UUID`	Globally unique episode identifier.
`trigger`	`TriggerRecord`	The event or query that initiated the episode (user request, system event, scheduled task).
`context`	`ContextSnapshot`	The relevant state at the time of the episode: active task, retrieved knowledge, session summary, environmental conditions.
`actions`	`List[ActionRecord]`	Ordered sequence of actions taken: tool invocations, sub-agent delegations, reasoning steps, with inputs/outputs for each.
`outcome`	`OutcomeRecord`	The final result: success/failure status, output artifacts, side effects, and any error information.
`evaluation`	`EvaluationRecord`	Post-hoc assessment: correctness score, efficiency metrics, human feedback (if available), automated eval results.
`metadata`	`EpisodeMetadata`	Timestamp, duration, agent version, model version, token cost, session ID (if applicable), provenance chain.

Sub-Record: EvaluationRecord#

The evaluation record is critical for episodic recall ranking:

\text{EvaluationRecord} = \langle \text{correctness} \in [0,1], \text{efficiency} \in [0,1], \text{human\_rating} \in \{-1, 0, +1, \bot\}, \text{automated\_score} \in [0,1], \text{failure\_class} \in \mathcal{F} \cup \{\bot\} \rangle

where $\mathcal{F}$ is a taxonomy of failure classes (hallucination, tool error, timeout, policy violation, etc.).

Embedding Representation#

Each episode is embedded into a dense vector space for similarity-based retrieval:

\mathbf{e}_{\text{eid}} = \text{Embed}(\text{trigger} \oplus \text{context\_summary} \oplus \text{outcome\_summary})

The embedding captures the semantic signature of the episode—what was asked, under what conditions, and what happened—enabling retrieval of episodes relevant to a current task.

11.4.2 Episodic Recall: Similarity-Based, Recency-Weighted, Outcome-Filtered#

Episodic recall is the process of retrieving relevant past episodes to inform current reasoning. The recall function combines multiple ranking signals.

Composite Recall Score#

Given a current task query $q$ , the recall score for episode $e$ is:

\text{recall\_score}(e, q) = w_{\text{sim}} \cdot \text{sim}(\mathbf{q}, \mathbf{e}_{\text{eid}}) + w_{\text{rec}} \cdot \text{recency}(e) + w_{\text{out}} \cdot \text{outcome\_quality}(e) + w_{\text{freq}} \cdot \text{access\_frequency}(e)

where:

$\text{sim}(\mathbf{q}, \mathbf{e}_{\text{eid}})$ : cosine similarity between the query embedding and the episode embedding.
$\text{recency}(e) = \exp\left(-\lambda \cdot (t_{\text{now}} - t_e)\right)$ : exponential decay based on the age of the episode, with decay constant $\lambda$ .
$\text{outcome\_quality}(e) = \text{evaluation}(e).\text{correctness} \cdot (1 + \text{evaluation}(e).\text{human\_rating})$ : episodes with positive outcomes and human endorsement rank higher.
$\text{access\_frequency}(e)$ : normalized count of how often the episode has been recalled successfully in prior tasks (reinforcing proven utility).

The weights $w_{\text{sim}}, w_{\text{rec}}, w_{\text{out}}, w_{\text{freq}}$ are tunable per task class and sum to 1.

Outcome Filtering#

Before ranking, episodes may be pre-filtered based on outcome class:

Success-biased recall: For task execution, prefer episodes where $\text{outcome} = \text{SUCCESS}$ to replicate successful strategies.
Failure-biased recall: For error diagnosis or risk assessment, prefer episodes where $\text{outcome} = \text{FAILURE}$ to identify pitfalls and avoid repetition.
Correction-biased recall: For self-improvement, prefer episodes containing human corrections to internalize feedback.

\text{FilteredSet}(q, \text{mode}) = \{e \in \mathcal{M}_E : \text{outcome\_class}(e) \in \text{AllowedClasses}(\text{mode}) \land \text{sim}(\mathbf{q}, \mathbf{e}) > \theta_{\min}\}

Pseudo-Algorithm 11.5: Episodic Memory Recall

PROCEDURE RecallEpisodes(query, mode, top_k, budget_tokens):
    // Step 1: Embed the query
    q_vec ← Embed(query)
 
    // Step 2: Candidate retrieval (ANN search with pre-filter)
    allowed_classes ← GetAllowedOutcomeClasses(mode)
    candidates ← EpisodicIndex.search(
        vector=q_vec,
        filter={"outcome_class": allowed_classes},
        top_n=top_k * 3,   // over-retrieve for re-ranking
        min_similarity=θ_min
    )
 
    // Step 3: Compute composite recall scores
    FOR EACH episode e IN candidates:
        e.recall_score ← (
            w_sim * CosineSimilarity(q_vec, e.embedding)
          + w_rec * exp(-λ * (now() - e.timestamp))
          + w_out * e.evaluation.correctness * (1 + e.evaluation.human_rating)
          + w_freq * NormalizedAccessFrequency(e)
        )
 
    // Step 4: Rank and select top-k within token budget
    SORT candidates BY recall_score DESCENDING
    selected ← []
    tokens_used ← 0
    FOR EACH episode e IN candidates:
        episode_tokens ← TokenCount(FormatEpisodeForContext(e))
        IF tokens_used + episode_tokens > budget_tokens:
            CONTINUE
        selected.append(e)
        tokens_used ← tokens_used + episode_tokens
        IF |selected| ≥ top_k:
            BREAK
 
    // Step 5: Record access for frequency tracking
    FOR EACH episode e IN selected:
        EpisodicIndex.record_access(e.eid, query_context=query)
 
    RETURN selected

11.4.3 Episodic Consolidation: Merging, Generalizing, and Forgetting#

Over time, episodic memory grows without bound unless consolidation processes merge, generalize, and forget episodes. Consolidation is analogous to memory consolidation in cognitive science: repeated patterns are abstracted into general knowledge, redundant episodes are merged, and low-value episodes are forgotten.

Merging#

When multiple episodes share similar triggers, contexts, and outcomes, they can be merged into a single consolidated episode:

e_{\text{merged}} = \text{Merge}(\{e_1, e_2, \ldots, e_n\}) \quad \text{where } \forall i,j: \text{sim}(\mathbf{e}_i, \mathbf{e}_j) > \theta_{\text{merge}}

The merged episode retains:

The most common trigger pattern.
A generalized context (intersection of context features).
A composite action sequence (the most successful action path).
Aggregated evaluation metrics.
Provenance links to all source episodes.

Generalization (Promotion to Semantic Memory)#

When a cluster of episodes reveals a consistent pattern, the pattern can be extracted and promoted to semantic memory as a general rule:

\text{pattern} = \text{GeneralizeFromEpisodes}(\{e_1, \ldots, e_n\}) \rightarrow \mathcal{M}_K

For example, if 15 episodes show that "API X returns a 429 error when called more than 100 times per minute," this is promoted to semantic memory as a documented rate limit.

Forgetting#

Episodes are candidates for forgetting when they satisfy all of the following:

$\text{access\_frequency}(e) < \theta_{\text{forget\_freq}}$ (rarely recalled).
$\text{recency}(e) < \theta_{\text{forget\_recency}}$ (sufficiently old).
$\text{outcome\_quality}(e) < \theta_{\text{forget\_quality}}$ (low-value outcome).
The episode has been subsumed by a merged or generalized record.

The forgetting score is:

f(e) = (1 - \text{access\_frequency}(e)) \cdot (1 - \text{recency}(e)) \cdot (1 - \text{outcome\_quality}(e)) \cdot \mathbb{1}[\text{subsumed}(e)]

Episodes with $f(e) > \theta_{\text{forget}}$ are moved to cold storage or permanently deleted, depending on retention policy.

Pseudo-Algorithm 11.6: Episodic Consolidation

PROCEDURE ConsolidateEpisodicMemory(M_E, schedule):
    // Triggered on schedule (e.g., daily) or when |M_E| exceeds capacity threshold
 
    // Phase 1: Identify merge candidates via clustering
    clusters ← ClusterEpisodes(M_E, similarity_threshold=θ_merge, min_cluster_size=3)
 
    FOR EACH cluster C IN clusters:
        IF |C| ≥ 3:
            merged ← MergeEpisodes(C)
            M_E.insert(merged)
            FOR EACH e IN C:
                e.status ← SUBSUMED
                e.subsumed_by ← merged.eid
 
    // Phase 2: Generalization — extract patterns from large clusters
    FOR EACH cluster C IN clusters:
        IF |C| ≥ pattern_threshold:
            pattern ← ExtractPattern(C)
            IF PatternIsNovel(pattern, SemanticMemory):
                PromoteToSemanticMemory(pattern, provenance=C)
 
    // Phase 3: Forgetting — evict low-value, subsumed episodes
    FOR EACH episode e IN M_E:
        IF e.status = SUBSUMED:
            f_score ← (1 - AccessFreq(e)) * (1 - Recency(e)) * (1 - OutcomeQuality(e))
            IF f_score > θ_forget:
                IF RetentionPolicy.allows_deletion(e):
                    ArchiveOrDelete(e)
                    ConsolidationLog.record(e, action="forgotten")
 
    // Phase 4: Re-index remaining episodes
    EpisodicIndex.rebuild(M_E.active_episodes())

11.5 Semantic Memory: Canonical Organizational and Domain Knowledge#

Semantic memory ( $\mathcal{M}_K$ ) stores authoritative, curated, version-controlled knowledge about the domain, the organization, and the world. It is the agent's structured understanding of entities, relationships, rules, and facts—independent of any specific episode or session. Semantic memory is the highest-authority knowledge layer: when conflicts arise between agent-learned information and semantic memory, semantic memory prevails unless explicitly overridden by a validated correction.

11.5.1 Knowledge Graph Integration: Entity-Relation-Attribute Triples#

Semantic memory is optimally represented as a knowledge graph augmented with dense embeddings, enabling both structured traversal and semantic search.

Triple Model#

The knowledge graph $\mathcal{G} = (V, E, A)$ consists of:

Vertices $V$ : entities (persons, systems, concepts, documents, APIs, etc.), each with a typed entity class.
Edges $E \subseteq V \times R \times V$ : directed relations between entities, where $R$ is the relation type set.
Attributes $A: V \cup E \to \mathcal{P}(\text{Key} \times \text{Value})$ : typed key-value attribute sets on entities and relations.

Each triple is stored as:

(s, r, o, \text{attr}, \text{prov}, v, t_{\text{valid}}, t_{\text{expire}})

where:

$s$ : subject entity.
$r$ : relation type.
$o$ : object entity (or literal value for attribute triples).
$\text{attr}$ : auxiliary attributes (confidence, scope, etc.).
$\text{prov}$ : provenance — the source of this knowledge (document URI, episode ID, human curator, etc.).
$v$ : version number.
$t_{\text{valid}}$ : timestamp from which this triple is valid.
$t_{\text{expire}}$ : timestamp at which this triple expires (or $\infty$ for permanent knowledge).

Retrieval over the Knowledge Graph#

Agent queries against semantic memory employ a hybrid strategy:

Structured query: SPARQL-like traversal for known entity relationships (e.g., "What are the dependencies of ServiceX?").
Semantic search: Dense-vector similarity search over entity/triple embeddings for fuzzy or exploratory queries (e.g., "What services are affected by database latency?").
Graph walk: Multi-hop traversal from a seed entity to discover contextually relevant knowledge within $k$ hops.

The retrieval score for a knowledge item $k_i$ given query $q$ integrates all three signals:

\text{relevance}(k_i, q) = \alpha_s \cdot \text{struct\_match}(k_i, q) + \alpha_v \cdot \text{sim}(\mathbf{q}, \mathbf{k}_i) + \alpha_g \cdot \text{graph\_proximity}(k_i, q) + \alpha_a \cdot \text{authority}(k_i)

where $\text{authority}(k_i)$ reflects the provenance quality (human-curated $>$ agent-learned $>$ inferred).

11.5.2 Ontology Management and Taxonomy Versioning#

The schema governing semantic memory—the ontology—defines the permitted entity types, relation types, and attribute schemas. Ontology management is essential for preventing semantic drift and ensuring interoperability across agent versions.

Ontology Version Control#

The ontology $\mathcal{O}$ is versioned using semantic versioning:

\mathcal{O}_v = \langle \text{EntityTypes}_v, \text{RelationTypes}_v, \text{AttributeSchemas}_v, \text{Constraints}_v, \text{MigrationScript}_v \rangle

Version transitions follow strict migration rules:

Backward-compatible changes (adding optional attributes, new entity types): minor version increment.
Breaking changes (removing entity types, changing attribute types, altering relation semantics): major version increment with mandatory migration script.
All knowledge items carry their ontology version: $k_i.\text{ontology\_version} = v$ .

Taxonomy Hierarchies#

Entity types and relation types are organized in inheritance hierarchies:

\text{Service} \prec \text{Microservice} \prec \text{APIGateway}

Queries against a parent type automatically include instances of all child types, enabling generalized reasoning without explicit enumeration.

11.5.3 Conflict Resolution Between Agent-Learned and Authoritative Knowledge#

Conflicts arise when an agent's episodic experience suggests a fact that contradicts the authoritative knowledge graph. The conflict resolution protocol operates as follows:

Conflict Detection#

During any memory write or promotion, the system checks for contradictions:

\text{conflict}(k_{\text{new}}, \mathcal{G}) = \exists k_{\text{existing}} \in \mathcal{G}: \text{subject}(k_{\text{new}}) = \text{subject}(k_{\text{existing}}) \land \text{relation}(k_{\text{new}}) = \text{relation}(k_{\text{existing}}) \land \text{object}(k_{\text{new}}) \neq \text{object}(k_{\text{existing}})

Resolution Hierarchy#

When a conflict is detected, resolution follows a strict authority hierarchy:

Human-curated knowledge ( $\text{authority} = 1.0$ ): always prevails unless a human explicitly revises it.
Verified automated knowledge ( $\text{authority} = 0.8$ ): knowledge derived from automated processes with validation (e.g., CI/CD-verified API schemas).
Agent-learned knowledge ( $\text{authority} = 0.5$ ): knowledge promoted from episodic memory through consolidation.
Inferred knowledge ( $\text{authority} = 0.3$ ): knowledge inferred by the agent during reasoning without direct verification.

Pseudo-Algorithm 11.7: Semantic Memory Conflict Resolution

PROCEDURE ResolveConflict(k_new, k_existing):
    // Compare authority levels
    IF Authority(k_new) > Authority(k_existing):
        // New knowledge is more authoritative
        IF k_existing.provenance.type = HUMAN_CURATED:
            // Never auto-override human-curated knowledge
            ConflictQueue.enqueue(k_new, k_existing, action=HUMAN_REVIEW_REQUIRED)
            RETURN DEFERRED
        ELSE:
            Archive(k_existing, reason="superseded", superseded_by=k_new)
            RETURN ACCEPT_NEW
 
    ELSE IF Authority(k_new) = Authority(k_existing):
        // Same authority: prefer fresher knowledge
        IF k_new.timestamp > k_existing.timestamp AND k_new.confidence > θ_confidence:
            Archive(k_existing, reason="superseded_by_newer")
            RETURN ACCEPT_NEW
        ELSE:
            ConflictQueue.enqueue(k_new, k_existing, action=MANUAL_RESOLUTION)
            RETURN DEFERRED
 
    ELSE:
        // New knowledge is less authoritative — reject or queue
        IF k_new.confidence > 0.95 AND k_new.evidence_count > evidence_threshold:
            // High-confidence agent knowledge contradicts lower-grade existing
            ConflictQueue.enqueue(k_new, k_existing, action=REVIEW_SUGGESTED)
            RETURN DEFERRED
        ELSE:
            ConflictLog.record(k_new, reason="rejected_lower_authority")
            RETURN REJECT

11.6 Procedural Memory: Learned Action Sequences, Tool Usage Patterns, and Workflow Templates#

Procedural memory ( $\mathcal{M}_P$ ) stores reusable, validated, versioned action sequences that the agent has learned to execute. It is the agent's equivalent of "muscle memory": compiled skills that can be invoked reliably without re-deriving them from first principles on every execution.

11.6.1 Procedure Extraction from Successful Execution Traces#

Procedures are not authored manually (though manual authoring is permitted as a bootstrap mechanism). Instead, they are extracted from successful execution traces through a systematic extraction pipeline.

Extraction Pipeline#

Successful Execution Traces
         │
         ▼
┌────────────────────┐
│ Trace Normalization │  ← Canonicalize tool names, parameter formats, error codes
└────────┬───────────┘
         ▼
┌────────────────────┐
│ Action Abstraction  │  ← Replace concrete values with typed parameters
└────────┬───────────┘
         ▼
┌────────────────────┐
│ Pattern Detection   │  ← Identify recurring action subsequences across traces
└────────┬───────────┘
         ▼
┌────────────────────┐
│ Procedure Synthesis │  ← Compile detected patterns into typed procedure templates
└────────┬───────────┘
         ▼
┌────────────────────┐
│ Validation & Test   │  ← Execute procedure against test cases; verify correctness
└────────┬───────────┘
         ▼
┌────────────────────┐
│ Registration        │  ← Register in procedural memory with version, schema, tests
└────────────────────┘

Procedure Schema#

\text{Procedure} = \langle \text{pid}, \text{name}, \text{version}, \text{preconditions}, \text{steps}, \text{postconditions}, \text{error\_handlers}, \text{test\_suite}, \text{metadata} \rangle

Field	Type	Description
`pid`	`UUID`	Globally unique procedure identifier.
`name`	`string`	Human-readable procedure name.
`version`	`SemVer`	Semantic version of this procedure.
`preconditions`	`List[Predicate]`	Conditions that must hold before execution (tool availability, permissions, state requirements).
`steps`	`List[ProcedureStep]`	Ordered action steps with typed inputs, outputs, branching logic, and retry policies.
`postconditions`	`List[Predicate]`	Conditions that must hold after successful execution (verification checks).
`error_handlers`	`Map[ErrorClass, RecoveryAction]`	Error-class-specific recovery strategies.
`test_suite`	`List[TestCase]`	Automated tests that validate procedure correctness.
`metadata`	`ProcedureMetadata`	Extraction provenance, success rate, average latency, token cost, usage count.

Each ProcedureStep is typed:

\text{ProcedureStep} = \langle \text{step\_id}, \text{action\_type}, \text{tool\_id}, \text{input\_schema}, \text{output\_schema}, \text{retry\_policy}, \text{timeout}, \text{guard\_condition} \rangle

Pseudo-Algorithm 11.8: Procedure Extraction from Traces

PROCEDURE ExtractProcedures(trace_store, min_occurrences, min_success_rate):
    // Step 1: Collect successful traces
    traces ← trace_store.query(
        filter={"outcome": "SUCCESS", "evaluation.correctness": ≥ 0.9},
        limit=10000
    )
 
    // Step 2: Normalize traces
    normalized ← []
    FOR EACH trace IN traces:
        norm_trace ← NormalizeTrace(trace)
        // Canonicalize tool names, abstract concrete parameter values
        norm_trace.steps ← [AbstractStep(s) FOR s IN trace.steps]
        normalized.append(norm_trace)
 
    // Step 3: Detect recurring patterns (subsequence mining)
    patterns ← FrequentSubsequenceMining(
        sequences=[t.steps FOR t IN normalized],
        min_support=min_occurrences,
        min_length=2,
        max_gap=2  // allow up to 2 intervening steps
    )
 
    // Step 4: Synthesize procedure templates
    procedures ← []
    FOR EACH pattern p IN patterns:
        // Compute success rate of traces containing this pattern
        containing_traces ← [t FOR t IN normalized IF Contains(t, p)]
        success_rate ← Mean([t.evaluation.correctness FOR t IN containing_traces])
        IF success_rate < min_success_rate:
            CONTINUE
 
        procedure ← SynthesizeProcedure(
            pattern=p,
            example_traces=containing_traces,
            extract_preconditions=TRUE,
            extract_postconditions=TRUE,
            extract_error_handlers=TRUE
        )
        procedure.metadata.success_rate ← success_rate
        procedure.metadata.source_trace_count ← |containing_traces|
        procedures.append(procedure)
 
    // Step 5: Validate and register
    FOR EACH procedure IN procedures:
        test_results ← RunProcedureTests(procedure, test_environment)
        IF test_results.all_passed:
            procedure.status ← CANDIDATE
            ProcedureRegistry.register(procedure)
        ELSE:
            ProcedureLog.record(procedure, status="validation_failed",
                                failures=test_results.failures)
 
    RETURN procedures

11.6.2 Procedure Versioning, Testing, and Promotion#

Procedures follow a lifecycle modeled after software release management:

Lifecycle Stages#

\text{DRAFT} \xrightarrow{\text{validate}} \text{CANDIDATE} \xrightarrow{\text{test}} \text{STAGED} \xrightarrow{\text{promote}} \text{ACTIVE} \xrightarrow{\text{deprecate}} \text{DEPRECATED} \xrightarrow{\text{archive}} \text{ARCHIVED}

Stage	Description	Usage
DRAFT	Newly extracted, not yet validated.	Not available for agent use.
CANDIDATE	Passed basic validation; awaiting comprehensive testing.	Available for shadow-mode execution (run but don't trust).
STAGED	Passed test suite; awaiting human review or canary deployment.	Available for low-risk tasks.
ACTIVE	Fully promoted; trusted for production use.	Primary procedure library.
DEPRECATED	Superseded by a newer version; still functional.	Fallback only.
ARCHIVED	Removed from active use; retained for audit.	Not executable.

Version Compatibility#

When a procedure is updated, the new version must demonstrate:

Backward compatibility: All test cases from the prior version pass (unless explicitly documented as breaking changes).
Improvement: The new version achieves equal or better success rate on a held-out evaluation set.
No regression: No previously passing test case fails in the new version.

\text{PromotionCriteria}(p_{v+1}) = \text{TestPass}(p_{v+1}, \mathcal{T}_v) \land \text{SuccessRate}(p_{v+1}) \geq \text{SuccessRate}(p_v) \land \lnot \exists t \in \mathcal{T}_v: \text{Pass}(p_v, t) \land \lnot \text{Pass}(p_{v+1}, t)

11.6.3 Procedural Memory as Compiled Agent Skills#

Active procedures serve as pre-compiled skills that the agent can invoke without re-deriving the action sequence. This is analogous to compiled code vs. interpreted code: procedural memory trades flexibility for speed, reliability, and token efficiency.

Invocation Model#

When the agent encounters a task that matches a known procedure's preconditions, the orchestrator can:

Direct invocation: Execute the procedure step-by-step, skipping the planning phase entirely.
Guided planning: Use the procedure as a template, allowing the agent to adapt individual steps while following the overall structure.
Verification-only: Let the agent plan freely but verify the generated plan against known procedures for consistency.

The selection between these modes depends on the confidence score of the match:

\text{mode} = \begin{cases} \text{DIRECT} & \text{if } \text{match\_confidence} > \theta_{\text{direct}} \text{ and procedure.status} = \text{ACTIVE} \\ \text{GUIDED} & \text{if } \theta_{\text{guided}} < \text{match\_confidence} \leq \theta_{\text{direct}} \\ \text{VERIFY} & \text{if } \theta_{\text{verify}} < \text{match\_confidence} \leq \theta_{\text{guided}} \\ \text{PLAN\_FROM\_SCRATCH} & \text{otherwise} \end{cases}

Token Efficiency of Procedural Memory#

Procedural invocation yields significant token savings. Instead of including the full reasoning chain in the working memory, the agent references a compact procedure identifier and its parameterized inputs:

\text{TokenCost}_{\text{procedural}} = \lvert \text{procedure\_ref} \rvert + \lvert \text{params} \rvert \ll \lvert \text{full\_plan} \rvert + \lvert \text{reasoning\_chain} \rvert = \text{TokenCost}_{\text{from\_scratch}}

Empirically, procedural invocation reduces per-task token consumption by $40\text{–}70\%$ for well-characterized tasks, directly translating to cost and latency improvements.

11.7 Cross-Layer Memory Promotion Policies#

Memory promotion is the controlled movement of information from a lower-durability layer to a higher-durability layer. Promotion is the only authorized mechanism for information to persist beyond its originating layer's lifecycle. Uncontrolled promotion leads to memory bloat, knowledge corruption, and semantic drift; therefore, promotion must be governed by explicit, auditable policies.

11.7.1 Promotion Criteria: Non-Obviousness, Correctness Improvement, Reusability#

Not all information deserves promotion. The promotion policy admits only items that satisfy all of the following criteria:

Non-Obviousness#

The information must not be trivially derivable from existing semantic memory or from the agent's base model knowledge. Formally:

\text{NonObvious}(m) = 1 - \max_{k \in \mathcal{M}_K} \text{sim}(\text{Embed}(m), \text{Embed}(k))

An item is non-obvious if $\text{NonObvious}(m) > \theta_{\text{novel}}$ . This prevents the memory system from filling with redundant restatements of known facts.

Correctness Improvement#

The information must demonstrably improve the agent's correctness on future tasks. This is measured by:

\Delta_{\text{correctness}}(m) = \mathbb{E}[\text{correctness}(\text{task} \mid \mathcal{M} \cup \{m\})] - \mathbb{E}[\text{correctness}(\text{task} \mid \mathcal{M})]

In practice, this is estimated by evaluating the item's impact on a held-out task set or by tracking correctness improvements in tasks where the item was retrieved vs. not retrieved.

Reusability#

The information must be applicable across multiple future tasks, not merely specific to the originating session:

\text{Reusability}(m) = \frac{\lvert \{ \text{task\_class} : \text{applicable}(m, \text{task\_class}) \} \rvert}{\lvert \text{TaskClasses} \rvert}

Items with $\text{Reusability}(m) < \theta_{\text{reuse}}$ are retained in episodic memory (where they serve as specific case studies) rather than promoted to semantic or procedural memory.

Composite Promotion Score#

\text{PromotionScore}(m) = \omega_n \cdot \text{NonObvious}(m) + \omega_c \cdot \Delta_{\text{correctness}}(m) + \omega_r \cdot \text{Reusability}(m)

Promotion is authorized when $\text{PromotionScore}(m) > \theta_{\text{promote}}$ .

11.7.2 Write Validation: Deduplication, Conflict Detection, Provenance Capture#

Every promoted write must pass through a validation pipeline before admission to the target memory layer.

Pseudo-Algorithm 11.9: Memory Promotion Pipeline

PROCEDURE PromoteMemoryItem(item, source_layer, target_layer):
    // Step 1: Compute promotion score
    score ← ComputePromotionScore(item)
    IF score < θ_promote:
        PromotionLog.record(item, action="rejected", reason="below_threshold",
                            score=score)
        RETURN REJECTED
 
    // Step 2: Deduplication check
    duplicates ← target_layer.find_similar(
        embedding=Embed(item),
        threshold=θ_dedup,
        limit=5
    )
    IF |duplicates| > 0:
        best_match ← duplicates[0]
        IF SemanticEquivalence(item, best_match) > θ_equiv:
            // Exact or near-exact duplicate — merge metadata
            MergeProvenance(best_match, item)
            PromotionLog.record(item, action="deduplicated",
                                merged_with=best_match.id)
            RETURN DEDUPLICATED
 
    // Step 3: Conflict detection
    conflicts ← DetectConflicts(item, target_layer)
    IF |conflicts| > 0:
        resolution ← ResolveConflicts(item, conflicts)  // See Pseudo-Alg 11.7
        IF resolution = REJECT:
            RETURN REJECTED
        ELSE IF resolution = DEFERRED:
            ConflictQueue.enqueue(item, conflicts)
            RETURN DEFERRED
 
    // Step 4: Provenance capture
    item.provenance ← ProvenanceRecord(
        source_layer=source_layer,
        source_id=item.source_id,
        promotion_timestamp=now(),
        promotion_score=score,
        promoting_agent=CurrentAgent.id,
        evidence_chain=item.evidence_references
    )
 
    // Step 5: Schema validation
    IF NOT target_layer.schema.validate(item):
        PromotionLog.record(item, action="rejected", reason="schema_violation")
        RETURN REJECTED
 
    // Step 6: Write with TTL and version
    item.version ← 1
    item.ttl ← target_layer.default_ttl
    item.created_at ← now()
    target_layer.write(item)
    PromotionLog.record(item, action="promoted", target=target_layer.name)
    RETURN PROMOTED

11.7.3 Expiry Policies: TTL, Access-Frequency Decay, Relevance Recalculation#

Every memory item in every persistent layer has an expiry policy that governs its lifecycle. The expiry system prevents unbounded memory growth and ensures that stale information is eventually removed or refreshed.

TTL Assignment#

Each layer has a default TTL, which may be overridden at the item level based on the item's characteristics:

Layer	Default TTL	Override Basis
Session ( $\mathcal{M}_S$ )	Session duration ( $\tau_{\text{max}}$ )	User-configured session timeout.
Episodic ( $\mathcal{M}_E$ )	90 days	Adjusted by outcome quality and access frequency.
Semantic ( $\mathcal{M}_K$ )	$\infty$ (permanent)	Explicit expiry for time-bound knowledge (e.g., API versions).
Procedural ( $\mathcal{M}_P$ )	$\infty$ (permanent)	Deprecated when superseded.

Access-Frequency Decay#

Items that are never accessed become candidates for eviction even before their TTL expires. The effective TTL is reduced by a decay factor based on access frequency:

\text{TTL}_{\text{effective}}(m) = \text{TTL}_{\text{base}}(m) \cdot \left(1 - \eta \cdot \exp\left(-\frac{\text{access\_count}(m)}{\mu}\right)\right)

where $\eta \in (0, 1)$ is the maximum decay fraction and $\mu$ is the expected access count. Items with zero accesses experience the maximum TTL reduction.

Periodic Relevance Recalculation#

A background process periodically re-evaluates the relevance of all persistent memory items against a representative sample of recent tasks:

\text{relevance\_current}(m) = \frac{1}{\lvert \mathcal{Q}_{\text{recent}} \rvert} \sum_{q \in \mathcal{Q}_{\text{recent}}} \text{sim}(\text{Embed}(m), \text{Embed}(q))

Items whose relevance drops below $\theta_{\text{stale}}$ are flagged for review. If unflagged for multiple review cycles, they are demoted to cold storage.

Pseudo-Algorithm 11.10: Memory Expiry Sweep

PROCEDURE RunExpirySweep(memory_layer, recent_queries):
    FOR EACH item m IN memory_layer.all_items():
        // Check hard TTL
        IF now() > m.created_at + m.ttl_effective:
            Expire(m, reason="ttl_exceeded")
            CONTINUE
 
        // Check access-frequency decay
        m.ttl_effective ← m.ttl_base * (1 - η * exp(-m.access_count / μ))
        IF now() > m.created_at + m.ttl_effective:
            Expire(m, reason="access_decay_expiry")
            CONTINUE
 
        // Check relevance staleness
        relevance ← MeanSimilarity(Embed(m), [Embed(q) FOR q IN recent_queries])
        IF relevance < θ_stale:
            m.staleness_flags ← m.staleness_flags + 1
            IF m.staleness_flags > max_staleness_flags:
                DemoteToColdStorage(m, reason="relevance_decay")
        ELSE:
            m.staleness_flags ← 0  // Reset if still relevant
 
    ExpiryLog.record_sweep(layer=memory_layer.name,
                            expired_count=expired,
                            demoted_count=demoted,
                            active_count=active)

11.8 Memory Wall Enforcement: Isolation Mechanisms Between Agent Instances and Layers#

The memory wall is not merely a design guideline; it must be mechanically enforced at multiple levels of the system architecture. Violations of memory isolation—whether through shared mutable state, context leakage, or unauthorized cross-layer reads—are treated as system invariant violations, equivalent to memory corruption in systems programming.

11.8.1 Enforcement Architecture#

┌──────────────────────────────────────────────────────────────┐
│                      AGENT RUNTIME                           │
│  ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐     │
│  │ Agent A  │   │ Agent B  │   │ Agent C  │   │ Agent D  │    │
│  │ (sid=1)  │   │ (sid=2)  │   │ (sid=1)  │   │ (sid=3)  │    │
│  └────┬─────┘   └────┬─────┘   └────┬─────┘   └────┬─────┘    │
│       │              │              │              │           │
│  ┌────▼──────────────▼──────────────▼──────────────▼─────┐    │
│  │               MEMORY ACCESS LAYER (MAL)                │    │
│  │  ┌────────────────────────────────────────────────┐    │    │
│  │  │ Policy Enforcement Point (PEP)                  │    │    │
│  │  │  • Caller identity verification                 │    │    │
│  │  │  • Session scope validation                     │    │    │
│  │  │  • Layer access authorization                   │    │    │
│  │  │  • Read/write budget enforcement                │    │    │
│  │  │  • Provenance injection                         │    │    │
│  │  └────────────────────────────────────────────────┘    │    │
│  └───────┬──────────┬──────────┬──────────┬───────────────┘    │
│          │          │          │          │                    │
│     ┌────▼───┐ ┌────▼───┐ ┌────▼───┐ ┌────▼───┐ ┌────────┐   │
│     │  M_W   │ │  M_S   │ │  M_E   │ │  M_K   │ │  M_P   │   │
│     │(local) │ │(scoped)│ │(shared)│ │(shared)│ │(shared)│   │
│     └────────┘ └────────┘ └────────┘ └────────┘ └────────┘   │
└──────────────────────────────────────────────────────────────┘

11.8.2 Isolation Rules#

The Memory Access Layer (MAL) enforces the following invariants:

Rule 1: Working Memory Isolation

\forall a_1, a_2 \in \text{Agents}: a_1 \neq a_2 \implies \mathcal{M}_W(a_1) \cap \mathcal{M}_W(a_2) = \emptyset

Working memory is strictly process-local. No agent can read or write another agent's working memory. This is enforced by in-process memory isolation (separate heap allocations or namespaces).

Rule 2: Session Memory Scope Binding

\text{MAL.read}(\mathcal{M}_S, \text{caller}) \text{ succeeds } \iff \text{caller.sid} = \mathcal{M}_S.\text{sid}

All session memory reads and writes are gated on session-identifier matching. The MAL rejects any request where the caller's session ID does not match the target session partition.

Rule 3: Episodic Memory Read Authorization

\text{MAL.read}(\mathcal{M}_E, \text{caller}) \text{ succeeds } \iff \text{caller.agent\_class} \in \text{AllowedReaders}(\mathcal{M}_E) \lor \text{caller.role} = \text{ADMIN}

Episodic memory is readable by agents of the same class (or authorized classes), but write access requires validated promotion.

Rule 4: Semantic Memory Write Gate

\text{MAL.write}(\mathcal{M}_K, \text{item}) \text{ succeeds } \iff \text{PromotionPipeline.validated}(\text{item}) = \text{TRUE}

No agent can directly write to semantic memory. All writes must pass through the promotion pipeline (Pseudo-Algorithm 11.9).

Rule 5: Procedural Memory Execution Gate

\text{MAL.execute}(\mathcal{M}_P, \text{procedure}, \text{caller}) \text{ succeeds } \iff \text{procedure.status} \in \{\text{STAGED}, \text{ACTIVE}\} \land \text{caller.permissions} \supseteq \text{procedure.required\_permissions}

11.8.3 Cross-Layer Contamination Detection#

The MAL includes a contamination detector that monitors for anomalous access patterns indicative of isolation violations:

Session cross-read detection: Alert if an agent attempts to read session memory with a mismatched session ID.
Working memory persistence detection: Alert if working memory items appear in subsequent sessions or agent instances (indicating they were improperly persisted).
Unpromoted semantic writes: Alert if knowledge items appear in semantic memory without corresponding promotion pipeline records.
Provenance chain validation: Periodically verify that all items in durable layers have complete provenance chains tracing back to their originating source.

Pseudo-Algorithm 11.11: Contamination Detection Sweep

PROCEDURE DetectContamination():
    alerts ← []
 
    // Check 1: Session memory partition integrity
    FOR EACH session_partition IN SessionStore.partitions():
        items ← session_partition.all_items()
        FOR EACH item IN items:
            IF item.provenance.sid ≠ session_partition.sid:
                alerts.append(Alert(
                    type="CROSS_SESSION_CONTAMINATION",
                    severity=CRITICAL,
                    details={partition: session_partition.sid,
                             item_sid: item.provenance.sid}
                ))
 
    // Check 2: Orphaned semantic memory (no provenance)
    FOR EACH item IN SemanticMemory.all_items():
        IF item.provenance IS NULL OR NOT ProvenanceChainValid(item):
            alerts.append(Alert(
                type="ORPHANED_KNOWLEDGE",
                severity=HIGH,
                details={item_id: item.id, missing="provenance_chain"}
            ))
 
    // Check 3: Working memory leakage into persistent stores
    recent_wm_hashes ← WorkingMemoryAuditLog.recent_content_hashes(window=24h)
    FOR EACH layer IN [EpisodicMemory, SemanticMemory]:
        FOR EACH item IN layer.recently_written(window=24h):
            IF ContentHash(item) IN recent_wm_hashes:
                IF NOT PromotionLog.has_record(item.id):
                    alerts.append(Alert(
                        type="UNPROMOTED_WM_LEAKAGE",
                        severity=CRITICAL,
                        details={item_id: item.id, layer: layer.name}
                    ))
 
    RETURN alerts

11.9 Memory Observability: Usage Analytics, Hit Rates, Staleness Metrics, and Audit Logs#

A memory system that cannot be observed cannot be optimized, debugged, or trusted. Memory observability provides the instrumentation necessary to understand how memory layers are performing, where bottlenecks exist, and whether the system's knowledge is accurate and current.

11.9.1 Core Metrics#

The observability layer tracks the following metrics per memory layer, emitted as structured telemetry:

Capacity and Utilization#

Metric	Definition	Unit
`memory.{layer}.utilization`	$\frac{\lvert \mathcal{M}_i \rvert}{C_i}$	Ratio $[0, 1]$
`memory.{layer}.item_count`	Number of items in layer	Count
`memory.{layer}.token_count`	Total tokens stored	Tokens
`memory.{layer}.growth_rate`	$\frac{d\lvert \mathcal{M}_i \rvert}{dt}$	Items/hour

Access Patterns#

Metric	Definition	Unit
`memory.{layer}.read_count`	Total reads per time window	Count/min
`memory.{layer}.write_count`	Total writes per time window	Count/min
`memory.{layer}.hit_rate`	$\frac{\text{reads returning } \geq 1 \text{ result}}{\text{total reads}}$	Ratio $[0, 1]$
`memory.{layer}.miss_rate`	$1 - \text{hit\_rate}$	Ratio $[0, 1]$
`memory.{layer}.read_latency_p50`	Median read latency	ms
`memory.{layer}.read_latency_p99`	99th percentile read latency	ms

Quality and Freshness#

Metric	Definition	Unit
`memory.{layer}.staleness_ratio`	$\frac{\lvert \{m : \text{age}(m) > \tau_{\text{stale}}\} \rvert}{\lvert \mathcal{M}_i \rvert}$	Ratio $[0, 1]$
`memory.{layer}.avg_item_age`	Mean age of items	Hours
`memory.{layer}.relevance_score`	Mean relevance of retrieved items to recent queries	$[0, 1]$
`memory.{layer}.recall_utilization`	Fraction of retrieved items actually used in generation	Ratio $[0, 1]$

Promotion and Eviction#

Metric	Definition	Unit
`memory.promotion.{src}_{dst}.count`	Promotions from source to destination layer	Count/hour
`memory.promotion.{src}_{dst}.rejection_rate`	Fraction of promotion attempts rejected	Ratio
`memory.eviction.{layer}.count`	Items evicted per time window	Count/hour
`memory.eviction.{layer}.reason_distribution`	Breakdown by eviction reason (TTL, decay, pruning, manual)	Distribution

11.9.2 Derived Diagnostics#

From the core metrics, the following diagnostic signals are computed:

Memory Efficiency Ratio#

\eta_{\text{mem}} = \frac{\text{recall\_utilization} \cdot \text{hit\_rate}}{\text{utilization}}

This ratio measures how efficiently the memory system converts stored information into useful context. A high utilization with low recall utilization indicates bloat; a low hit rate indicates retrieval quality issues.

Staleness Risk Score#

R_{\text{stale}} = \text{staleness\_ratio} \cdot (1 - \text{relevance\_score}) \cdot \text{utilization}

High $R_{\text{stale}}$ indicates that the memory layer is full of stale, irrelevant content—a condition that degrades retrieval quality and wastes token budget.

Context Pollution Index#

\text{CPI} = 1 - \frac{\sum_{m \in \text{retrieved}} \text{utility}(m, \text{task})}{\lvert \text{retrieved} \rvert}

CPI measures the fraction of retrieved memory items that did not contribute to the task outcome. A CPI approaching 1.0 indicates severe context pollution.

11.9.3 Audit Logging#

Every memory operation is recorded in an append-only audit log with the following schema:

\text{AuditEntry} = \langle \text{timestamp}, \text{operation}, \text{layer}, \text{item\_id}, \text{caller\_id}, \text{session\_id}, \text{provenance}, \text{result}, \text{metadata} \rangle

where $\text{operation} \in \{\text{READ}, \text{WRITE}, \text{PROMOTE}, \text{EVICT}, \text{MERGE}, \text{CONFLICT\_RESOLVE}, \text{EXPIRE}, \text{ARCHIVE}\}$ .

Audit logs serve three critical functions:

Compliance: Demonstrate that memory operations adhere to data governance policies (retention, PII handling, access control).
Debugging: Trace the provenance of any knowledge item back to its origin, through all promotions, merges, and modifications.
Optimization: Analyze access patterns to tune retrieval strategies, eviction policies, and token budget allocations.

11.9.4 Observability Dashboard Structure#

┌─────────────────────────────────────────────────────────────┐
│               MEMORY OBSERVABILITY DASHBOARD                │
├─────────────┬─────────────┬─────────────┬──────────┬────────┤
│   Working   │   Session   │  Episodic   │ Semantic │ Proced │
│   Memory    │   Memory    │  Memory     │ Memory   │ Memory │
├─────────────┼─────────────┼─────────────┼──────────┼────────┤
│ Util: 62%   │ Active: 47  │ Items: 12K  │ Triples: │ Procs: │
│ Overflow: 3 │ Avg Turns:  │ Hit Rate:   │ 245K     │  89    │
│ GC/min: 12  │   18.4      │   0.73      │ Hit: 0.91│ Active:│
│ Carry: 22%  │ Ckpts: 142  │ Staleness:  │ Stale:   │   67   │
│             │ Contam: 0   │   0.12      │   0.04   │ SRate: │
│             │             │ CPI: 0.18   │ Confl: 3 │  0.94  │
├─────────────┴─────────────┴─────────────┴──────────┴────────┤
│  PROMOTION FLOW                                              │
│  M_W ──► M_S: 8.2/hr  M_S ──► M_E: 1.4/hr                  │
│  M_E ──► M_K: 0.3/hr  M_E ──► M_P: 0.1/hr                  │
│  Rejection Rate: 34%   Conflict Rate: 2.1%                   │
├──────────────────────────────────────────────────────────────┤
│  ALERTS                                                      │
│  ⚠ Episodic Memory staleness rising (0.12 → 0.18, 7d trend)│
│  ⚠ Working memory overflow rate above threshold (3 > 2/min) │
│  ✓ No cross-session contamination detected                   │
│  ✓ All provenance chains validated                           │
└──────────────────────────────────────────────────────────────┘

11.9.5 Automated Alerting and Self-Healing#

The observability system triggers automated responses based on metric thresholds:

Pseudo-Algorithm 11.12: Memory Health Monitor

PROCEDURE MonitorMemoryHealth():
    LOOP every monitoring_interval:
        metrics ← CollectAllMemoryMetrics()
 
        // Alert 1: Memory layer approaching capacity
        FOR EACH layer IN MemoryLayers:
            IF metrics[layer].utilization > 0.85:
                TriggerAlert("CAPACITY_WARNING", layer,
                    message=f"{layer} at {metrics[layer].utilization*100}% capacity")
                IF layer.auto_eviction_enabled:
                    TriggerEvictionSweep(layer, target_utilization=0.70)
 
        // Alert 2: Hit rate degradation
        FOR EACH layer IN [Episodic, Semantic]:
            IF metrics[layer].hit_rate < θ_min_hit_rate:
                TriggerAlert("HIT_RATE_DEGRADATION", layer)
                ScheduleReindexing(layer)
 
        // Alert 3: Staleness accumulation
        FOR EACH layer IN [Episodic, Semantic]:
            IF metrics[layer].staleness_ratio > θ_max_staleness:
                TriggerAlert("STALENESS_ACCUMULATION", layer)
                ScheduleExpirySweep(layer)
 
        // Alert 4: Context pollution
        IF metrics.global.CPI > θ_max_CPI:
            TriggerAlert("CONTEXT_POLLUTION", severity=HIGH)
            ScheduleRetrievalTuning()
 
        // Alert 5: Promotion pipeline backup
        IF ConflictQueue.size() > θ_max_conflict_queue:
            TriggerAlert("PROMOTION_PIPELINE_BACKUP", severity=MEDIUM)
            NotifyHumanReviewers(ConflictQueue.pending())
 
        // Alert 6: Cross-session contamination
        contamination_alerts ← DetectContamination()
        IF |contamination_alerts| > 0:
            FOR EACH alert IN contamination_alerts:
                TriggerAlert(alert.type, severity=CRITICAL)
                IF alert.type = "CROSS_SESSION_CONTAMINATION":
                    QuarantineAffectedSessions(alert.details)
 
        EmitHealthReport(metrics)

11.9.6 Operational Implications#

The observability infrastructure enables the following operational capabilities:

Capacity planning: Trend analysis on growth rates and utilization enables proactive scaling of memory backends before capacity exhaustion.
Retrieval quality tuning: Hit rate, recall utilization, and CPI metrics directly inform adjustments to embedding models, chunking strategies, and ranking weights.
Eviction policy calibration: Access-frequency distributions and staleness trends guide TTL and decay parameter optimization.
Cost attribution: Token-level tracking of memory retrieval costs enables per-task and per-layer cost attribution, supporting budget governance.
Compliance auditing: Complete audit trails with provenance chains support regulatory and organizational compliance requirements for data handling, retention, and access control.

Summary: The Memory Hierarchy as a Production System#

The five-layer memory hierarchy—working, session, episodic, semantic, and procedural—constitutes a complete, typed, mechanically enforced knowledge management subsystem for agentic AI. The architecture is governed by the following invariants:

Invariant	Enforcement Mechanism
Layer isolation	Memory Access Layer with typed access policies
Promotion validity	Validated pipeline with deduplication, conflict resolution, provenance
Capacity bounds	Token budgets, overflow handlers, eviction sweeps
Knowledge freshness	TTL, access-frequency decay, relevance recalculation
Observability	Structured metrics, audit logs, automated alerting
Session privacy	Namespace partitioning, contamination detection
Procedural reliability	Version-controlled lifecycle with test gates

The memory wall is not a suggestion; it is an architectural boundary enforced through typed contracts, storage-level partitioning, runtime validation, and continuous monitoring. Agents that operate within this discipline achieve predictable, auditable, and continuously improving performance. Agents that violate the memory wall degrade unpredictably and fail silently—the worst operational outcome in a production system.

\boxed{\text{Memory}\ \neq\ \text{Context.} \quad \text{Memory is a governed, stratified, typed subsystem that feeds context through controlled, observable pipelines.}}