11.1 The Memory Wall Thesis: Why Agents Need Hard Boundaries Between Memory Layers#
11.1.1 The Fundamental Problem#
An agentic system that operates without structurally enforced memory boundaries inevitably degrades along three axes simultaneously: correctness (stale or conflicting information poisons reasoning), latency (unbounded context growth inflates inference cost super-linearly), and reliability (uncontrolled state leakage across tasks produces non-reproducible behavior). The Memory Wall Thesis asserts that no amount of prompt engineering, context-window expansion, or retrieval sophistication compensates for the absence of a rigorous, typed, and mechanically enforced separation between memory layers. Memory must be treated as a stratified, governed subsystem—analogous to the register–cache–RAM–disk hierarchy in computer architecture—where each layer has explicit capacity bounds, admission policies, eviction semantics, durability guarantees, and promotion/demotion contracts.
11.1.2 Formal Statement of the Memory Wall Thesis#
Thesis. An agentic runtime that commingles ephemeral reasoning state, session-scoped interaction history, validated experiential records, canonical domain knowledge, and learned procedural skill within a single undifferentiated context buffer will exhibit monotonically increasing failure rates as task complexity, session duration, or agent population grows. Correctness, cost-efficiency, and auditability require that each memory class occupy a distinct storage tier with independently enforced write-admission, read-access, expiry, and isolation policies.
The thesis is motivated by three observable failure modes in production agentic systems:
| Failure Mode | Root Cause | Consequence |
|---|---|---|
| Context Saturation | Working memory absorbs stale history and domain knowledge simultaneously, exceeding the effective reasoning capacity of the model. | Degraded chain-of-thought quality; hallucination rate increases as token budget is consumed by low-signal context. |
| Cross-Session Contamination | Information from a prior session leaks into a current session due to shared mutable state. | Privacy violations; incorrect assumptions carried across user boundaries; non-reproducible outputs. |
| Knowledge Staleness Cascade | Agent-learned "facts" overwrite or shadow authoritative organizational knowledge without version control or conflict resolution. | Systematic drift from ground truth; compounding errors as downstream agents consume corrupted semantic memory. |
11.1.3 The Hierarchical Memory Model#
The memory hierarchy for agentic systems comprises five formally distinct layers, ordered by volatility, capacity, access latency, and governance rigor:
where:
- : Working Memory — ephemeral scratch space for active reasoning within a single agent step or micro-plan.
- : Session Memory — conversation-scoped state persisted across turns within a bounded interaction session.
- : Episodic Memory — validated, structured records of past agent experiences with full provenance.
- : Semantic Memory — canonical organizational and domain knowledge, curated and version-controlled.
- : Procedural Memory — learned action sequences, tool-usage patterns, and workflow templates compiled from successful execution traces.
Each layer is characterized by a tuple of operational properties:
where is the capacity bound (in tokens or records), is the default time-to-live, is the write-admission policy, is the eviction policy, is the access-control policy, is the durability class (ephemeral, session-durable, persistent), and is the isolation scope (step, session, agent, organization).
Table: Memory Layer Properties
| Layer | Volatility | Capacity | Durability | Isolation Scope | Governance |
|---|---|---|---|---|---|
| Highest | Smallest (reserved token window) | Ephemeral (intra-step) | Single agent step | Automatic GC | |
| High | Moderate (session budget) | Session-durable | Single session | TTL + checkpointing | |
| Moderate | Large (indexed store) | Persistent | Agent or agent-class | Validated writes | |
| Low | Very large (knowledge base) | Persistent, versioned | Organization-wide | Curated, approval-gated | |
| Lowest | Moderate (procedure library) | Persistent, versioned | Agent-class or org | Tested + promoted |
11.1.4 The Cost of Violating the Memory Wall#
Without enforced boundaries, the effective reasoning quality of an agent degrades as a function of context pollution. Define the signal density of the active context window as:
where measures the marginal contribution of token to the current task objective. When memory layers are commingled, decays as irrelevant historical tokens, stale knowledge, and prior-session artifacts dilute the signal:
In contrast, a properly partitioned system loads only task-relevant slices from each layer, maintaining:
where applies the retrieval policy of layer to extract only high-utility items. The ratio grows with system scale, explaining why memory-wall violations cause progressively worse failures as agents mature and accumulate state.
11.1.5 Architectural Implications#
The Memory Wall Thesis imposes the following non-negotiable architectural requirements:
- Typed interfaces between every memory layer: each layer exposes read, write, query, and evict operations through versioned, schema-described contracts (JSON-RPC at the application boundary; gRPC/Protobuf internally).
- Independent storage backends: working memory resides in-process or in a fast key-value store; session memory in a session-scoped store with TTL; episodic and semantic memory in indexed, persistent stores (vector databases, knowledge graphs); procedural memory in a versioned procedure registry.
- Explicit promotion paths: data moves between layers only through validated promotion pipelines, never through implicit leakage or shared mutable references.
- Budget enforcement: the prefill compiler allocates token budgets per layer before context assembly; no layer may exceed its allocation without explicit overflow handling.
- Audit trail: every write, promotion, eviction, and read across all layers is logged with provenance metadata sufficient for post-hoc replay and compliance review.
11.2 Working Memory: Ephemeral Scratch Space for Active Reasoning#
Working memory () is the innermost, most volatile layer of the memory hierarchy. It holds the transient cognitive state required for the agent to execute a single reasoning step, sub-plan, or tool invocation. Its defining characteristic is ephemerality: contents exist only for the duration of the current micro-task and are discarded or compressed upon step completion.
11.2.1 Capacity Limits and Overflow Strategies#
Capacity Model#
The capacity of working memory is fundamentally constrained by the model's context window (measured in tokens), minus reservations for other memory layers and the output generation budget:
where:
- : tokens reserved for the system prompt (role policy, protocol bindings, tool schemas).
- : tokens reserved for session memory summary.
- : tokens reserved for episodic memory retrievals.
- : tokens reserved for semantic knowledge retrievals.
- : tokens reserved for procedural memory (active procedure templates).
- : tokens reserved for the model's generation output.
Example budget allocation for a 128K-token window:
| Component | Budget (tokens) | Percentage |
|---|---|---|
| System prompt () | 4,096 | 3.2% |
| Session memory () | 8,192 | 6.4% |
| Episodic retrievals () | 8,192 | 6.4% |
| Semantic retrievals () | 16,384 | 12.8% |
| Procedural memory () | 4,096 | 3.2% |
| Output reservation () | 16,384 | 12.8% |
| Working memory () | 70,656 | 55.2% |
Overflow Strategies#
When the working memory contents approach or exceed , the system must apply one of the following overflow strategies, selected by policy:
- Compression: Apply an LLM-based or extractive summarization pass over the current working memory, reducing token count while preserving decision-relevant facts. The compression ratio is defined as:
The target ratio is set per task class; typical values range from (aggressive compression for simple tasks) to (conservative compression for multi-step reasoning).
-
Externalization: Offload intermediate results, partial computations, or detailed evidence to an external scratch store (key-value store, temporary file, or tool-managed workspace). Working memory retains only a typed reference (URI + schema hash + token-count metadata).
-
Selective Pruning: Remove working memory items ranked lowest by a recency-weighted utility score:
where measures how many downstream reasoning steps reference item . Items with are evicted.
- Step Decomposition: If the current micro-task genuinely requires more working memory than , the orchestrator decomposes it into smaller sub-steps, each fitting within the budget.
Pseudo-Algorithm 11.1: Working Memory Overflow Handler
PROCEDURE HandleWorkingMemoryOverflow(M_W, C_W, task):
current_size ← TokenCount(M_W)
IF current_size ≤ C_W THEN RETURN M_W
// Phase 1: Prune low-utility items
FOR EACH item m IN M_W:
m.score ← α * Relevance(m, task) + β * Recency(m) + γ * Dependency(m)
SORT M_W BY score ASCENDING
WHILE TokenCount(M_W) > C_W * 0.9 AND M_W.has_prunable_items():
lowest ← M_W.pop_lowest_score()
IF lowest.score < θ_prune:
EvictionLog.record(lowest, reason="low_utility_prune")
ELSE:
BREAK
// Phase 2: Compress if still over budget
IF TokenCount(M_W) > C_W:
compressed ← CompressSummarize(M_W, target_ratio=0.5)
EvictionLog.record_compression(original_size=current_size,
compressed_size=TokenCount(compressed))
M_W ← compressed
// Phase 3: Externalize if still over budget
IF TokenCount(M_W) > C_W:
externalizable ← SelectExternalizableSections(M_W)
FOR EACH section IN externalizable:
ref ← ExternalScratchStore.write(section)
M_W.replace(section, ExternalReference(ref))
// Phase 4: Escalate to decomposition if still over budget
IF TokenCount(M_W) > C_W:
RAISE TaskDecompositionRequired(task, current_budget=C_W)
RETURN M_W11.2.2 Working Memory as Context Window Reservation#
Working memory is not merely a conceptual abstraction; it maps directly to a reserved region of the model's context window. The prefill compiler treats working memory as a first-class segment in the assembled prompt, positioned after the system prompt and before the output cursor.
Segment Layout#
The compiled prompt follows a deterministic segment ordering:
┌─────────────────────────────────────┐
│ SEGMENT 0: System Prompt │ ← Role policy, protocol, constraints
│ SEGMENT 1: Tool Affordances │ ← Active tool schemas (lazy-loaded)
│ SEGMENT 2: Semantic Memory Slice │ ← Retrieved domain knowledge
│ SEGMENT 3: Episodic Memory Slice │ ← Relevant past episodes
│ SEGMENT 4: Session Memory Summary │ ← Compressed session history
│ SEGMENT 5: Procedural Memory Slice │ ← Active procedure templates
│ SEGMENT 6: Working Memory │ ← Current reasoning state ← ACTIVE
│ SEGMENT 7: Current User Turn │ ← Latest input / instruction
│ SEGMENT 8: [OUTPUT RESERVATION] │ ← Reserved for generation
└─────────────────────────────────────┘Each segment is assigned a token budget at compilation time. The working memory segment receives whatever capacity remains after all other segments are allocated, up to its configured maximum . This reservation model ensures that working memory never starves other layers and that other layers never crowd out active reasoning.
Formal Reservation Invariant#
At every compilation step, the following invariant must hold:
If the invariant is violated during compilation, the prefill compiler triggers the overflow handler (Pseudo-Algorithm 11.1) on the working memory segment before proceeding.
11.2.3 Garbage Collection and TTL Policies#
Working memory items have the shortest lifecycle in the memory hierarchy. Garbage collection operates at two granularities:
Step-Level GC#
Upon completion of each agent step (a single plan-act-verify cycle), the working memory is fully cleared unless the orchestrator explicitly marks specific items for carry-forward to the next step. The carry-forward set is bounded:
This bound prevents gradual accumulation of "zombie" working memory across steps.
Intra-Step TTL#
Within a single step, individual working memory items may be assigned a TTL measured in sub-operations (e.g., tool calls, retrieval rounds). An item with is evicted at the next sub-operation boundary.
where is the current sub-operation index and is the insertion time.
Pseudo-Algorithm 11.2: Working Memory Garbage Collection
PROCEDURE GarbageCollectWorkingMemory(M_W, step_completed, carry_forward_set):
IF step_completed:
// Full GC: retain only explicitly marked items
FOR EACH item m IN M_W:
IF m NOT IN carry_forward_set:
M_W.remove(m)
GCLog.record(m, reason="step_boundary_eviction")
// Enforce carry-forward bound
IF TokenCount(M_W) > κ * C_W:
excess ← SelectLowestUtility(M_W, target=κ * C_W)
FOR EACH item m IN excess:
M_W.remove(m)
GCLog.record(m, reason="carry_forward_overflow")
ELSE:
// Intra-step TTL sweep
FOR EACH item m IN M_W:
m.ttl ← m.ttl - 1
IF m.ttl ≤ 0:
M_W.remove(m)
GCLog.record(m, reason="ttl_expired")
RETURN M_WPromotion Before Eviction#
Before any working memory item is evicted, the GC process checks whether the item qualifies for promotion to a higher-durability layer (session memory or episodic memory). Promotion eligibility is evaluated by the cross-layer promotion policy (§11.7). This ensures that genuinely valuable intermediate results are not lost.
11.3 Session Memory: Conversation-Scoped State with Defined Lifecycle#
Session memory () captures the stateful context of an ongoing interaction between a user (or calling system) and an agent. It persists across multiple turns within a single session but is strictly isolated from other sessions and from the agent's long-term memory layers.
11.3.1 Session Initialization, Checkpointing, and Resumption#
Session Lifecycle State Machine#
Every session follows a formally defined lifecycle:
┌──────────┐
create() │ │ checkpoint()
───────────► │ ACTIVE │ ──────────────► CHECKPOINTED
│ │ │
└────┬─────┘ │ resume()
│ ▼
│ expire() / close() ┌──────────┐
▼ │ ACTIVE │
┌──────────┐ │(resumed) │
│ CLOSED │ └───────────┘
└──────────┘
│
│ archive()
▼
┌──────────┐
│ ARCHIVED │ (episodic promotion candidate)
└──────────┘Session Initialization#
Session creation requires the following typed contract:
where:
- : globally unique session identifier (UUIDv7 for temporal ordering).
- : user or caller identity, governing access-control policies.
- : the agent instance bound to this session.
- : creation timestamp.
- : maximum session duration (hard TTL).
- : token budget for session memory within the context window.
- : initial state (user preferences, task description, prior context if resuming).
- : session-level governance rules (data retention class, PII handling, etc.).
Checkpointing#
Checkpoints capture the session state at a point in time, enabling resumption after interruption (network failure, user departure, agent restart). A checkpoint is a serialized snapshot:
where is the checkpoint sequence number and is a compressed representation of the session history up to turn .
Pseudo-Algorithm 11.3: Session Checkpointing
PROCEDURE CheckpointSession(session, trigger):
// trigger ∈ {periodic, turn_count_threshold, user_request, error_recovery}
k ← session.checkpoint_count + 1
summary ← SummarizeSessionHistory(
session.turn_log,
max_tokens=session.C_S * 0.5,
preserve_decisions=TRUE,
preserve_user_corrections=TRUE
)
checkpoint ← Checkpoint(
sid=session.sid,
sequence=k,
timestamp=now(),
summary=summary,
turn_log=session.turn_log_since_last_checkpoint(),
working_state=SerializeWorkingMemory(session.working_memory),
metadata={
"turn_count": session.turn_count,
"token_usage": session.total_tokens_consumed,
"trigger": trigger
}
)
SessionStore.write(checkpoint)
session.checkpoint_count ← k
session.last_checkpoint_time ← now()
RETURN checkpointResumption#
Session resumption reconstructs the active session state from the most recent checkpoint:
- Load the latest checkpoint from the session store.
- Reconstruct working memory from .
- Inject into the session memory segment of the prefill.
- Optionally replay the turn log since checkpoint for full fidelity.
- Validate that no cross-session contamination has occurred during the idle period.
11.3.2 Session Isolation: Cross-Session Contamination Prevention#
Session isolation is a security-critical property. Contamination occurs when information from session becomes accessible within session , violating confidentiality and correctness guarantees.
Isolation Mechanisms#
-
Namespace partitioning: All session memory reads and writes are scoped to the session identifier. The memory store enforces -based partitioning at the storage layer, not merely at the application layer.
-
Agent instance isolation: Each session binds to an independent agent runtime instance (or, at minimum, a logically isolated execution context). Shared-nothing semantics apply: no mutable state is shared between sessions.
-
Context window hygiene: The prefill compiler must verify that no tokens from session appear in the compiled context for session . This is enforced by a provenance check on every context segment:
where denotes session-independent content (system prompts, tool schemas, semantic knowledge).
- Shared resource access control: If sessions share access to common tools or external resources, invocations are scoped with caller credentials and session-bound authorization tokens, preventing one session from observing another's tool invocation results.
Formal Isolation Invariant#
This invariant is enforced at the storage layer through partition keys and at the runtime layer through context compilation validation.
11.3.3 Session Summarization for Long-Running Interactions#
As sessions extend beyond dozens of turns, the raw turn log exceeds the session memory budget . Summarization compresses the history while preserving decision-critical information.
Hierarchical Summarization Strategy#
The session history is summarized at multiple granularities:
- Turn-level: Each turn is compressed to its essential content (user intent, agent action, outcome).
- Segment-level: Groups of related turns (e.g., a sub-task completion) are merged into a segment summary.
- Session-level: The entire session is represented by a running summary that is updated incrementally.
The running summary after turn is computed as:
where denotes concatenation and is a compression function that preserves:
- User corrections: Any explicit correction by the user is retained verbatim or near-verbatim, as these represent high-value signal for avoiding repeated errors.
- Decision points: Key decisions made during the session, including alternatives considered and reasons for selection.
- Unresolved items: Open questions, pending actions, or ambiguities that affect future turns.
- Accumulated constraints: Constraints, preferences, or filters established during the session.
Information-Theoretic Quality Metric#
Define the summarization quality as the mutual information between the full turn log and the summary, normalized by the summary length:
The summarization procedure maximizes subject to the constraint . In practice, this is approximated by training or prompting the summarizer to prioritize high-information-density items (corrections, decisions, constraints) over low-density items (pleasantries, acknowledgments, repeated context).
Pseudo-Algorithm 11.4: Incremental Session Summarization
PROCEDURE UpdateSessionSummary(session, new_turn):
session.turn_log.append(new_turn)
session.turn_count ← session.turn_count + 1
// Check if summarization is needed
projected_size ← TokenCount(session.summary) + TokenCount(new_turn)
IF projected_size ≤ session.C_S:
session.summary ← session.summary ⊕ CompressTurn(new_turn)
RETURN
// Hierarchical re-summarization
// Step 1: Identify segment boundaries in recent turns
segments ← IdentifySegmentBoundaries(
session.turn_log,
since=session.last_summarization_turn
)
// Step 2: Summarize each segment
segment_summaries ← []
FOR EACH segment IN segments:
seg_summary ← SummarizeSegment(segment,
preserve=["user_corrections", "decisions", "constraints", "open_items"])
segment_summaries.append(seg_summary)
// Step 3: Merge with existing session summary
candidate ← MergeSummaries(session.summary, segment_summaries,
budget=session.C_S,
priority_order=["corrections", "constraints", "decisions",
"outcomes", "context"])
// Step 4: Validate summary quality
IF SummaryQualityCheck(candidate, session.turn_log):
session.summary ← candidate
session.last_summarization_turn ← session.turn_count
ELSE:
// Fallback: aggressive compression of oldest segments
session.summary ← AggressiveCompress(session.summary,
target_ratio=0.5)
session.summary ← session.summary ⊕ segment_summaries
Truncate(session.summary, session.C_S)11.4 Episodic Memory: Validated Records of Past Agent Experiences#
Episodic memory () stores structured records of completed agent experiences—resolved tasks, encountered errors, successful strategies, and observed outcomes. Unlike session memory (which is scoped to a single interaction and eventually archived), episodic memory is a persistent, indexed, queryable store that enables the agent to learn from its own history.
11.4.1 Episode Schema: Trigger, Context, Action, Outcome, Evaluation, Timestamp#
Every episode is stored as a typed record conforming to a strict schema:
Field Definitions#
| Field | Type | Description |
|---|---|---|
eid | UUID | Globally unique episode identifier. |
trigger | TriggerRecord | The event or query that initiated the episode (user request, system event, scheduled task). |
context | ContextSnapshot | The relevant state at the time of the episode: active task, retrieved knowledge, session summary, environmental conditions. |
actions | List[ActionRecord] | Ordered sequence of actions taken: tool invocations, sub-agent delegations, reasoning steps, with inputs/outputs for each. |
outcome | OutcomeRecord | The final result: success/failure status, output artifacts, side effects, and any error information. |
evaluation | EvaluationRecord | Post-hoc assessment: correctness score, efficiency metrics, human feedback (if available), automated eval results. |
metadata | EpisodeMetadata | Timestamp, duration, agent version, model version, token cost, session ID (if applicable), provenance chain. |
Sub-Record: EvaluationRecord#
The evaluation record is critical for episodic recall ranking:
where is a taxonomy of failure classes (hallucination, tool error, timeout, policy violation, etc.).
Embedding Representation#
Each episode is embedded into a dense vector space for similarity-based retrieval:
The embedding captures the semantic signature of the episode—what was asked, under what conditions, and what happened—enabling retrieval of episodes relevant to a current task.
11.4.2 Episodic Recall: Similarity-Based, Recency-Weighted, Outcome-Filtered#
Episodic recall is the process of retrieving relevant past episodes to inform current reasoning. The recall function combines multiple ranking signals.
Composite Recall Score#
Given a current task query , the recall score for episode is:
where:
- : cosine similarity between the query embedding and the episode embedding.
- : exponential decay based on the age of the episode, with decay constant .
- : episodes with positive outcomes and human endorsement rank higher.
- : normalized count of how often the episode has been recalled successfully in prior tasks (reinforcing proven utility).
The weights are tunable per task class and sum to 1.
Outcome Filtering#
Before ranking, episodes may be pre-filtered based on outcome class:
- Success-biased recall: For task execution, prefer episodes where to replicate successful strategies.
- Failure-biased recall: For error diagnosis or risk assessment, prefer episodes where to identify pitfalls and avoid repetition.
- Correction-biased recall: For self-improvement, prefer episodes containing human corrections to internalize feedback.
Pseudo-Algorithm 11.5: Episodic Memory Recall
PROCEDURE RecallEpisodes(query, mode, top_k, budget_tokens):
// Step 1: Embed the query
q_vec ← Embed(query)
// Step 2: Candidate retrieval (ANN search with pre-filter)
allowed_classes ← GetAllowedOutcomeClasses(mode)
candidates ← EpisodicIndex.search(
vector=q_vec,
filter={"outcome_class": allowed_classes},
top_n=top_k * 3, // over-retrieve for re-ranking
min_similarity=θ_min
)
// Step 3: Compute composite recall scores
FOR EACH episode e IN candidates:
e.recall_score ← (
w_sim * CosineSimilarity(q_vec, e.embedding)
+ w_rec * exp(-λ * (now() - e.timestamp))
+ w_out * e.evaluation.correctness * (1 + e.evaluation.human_rating)
+ w_freq * NormalizedAccessFrequency(e)
)
// Step 4: Rank and select top-k within token budget
SORT candidates BY recall_score DESCENDING
selected ← []
tokens_used ← 0
FOR EACH episode e IN candidates:
episode_tokens ← TokenCount(FormatEpisodeForContext(e))
IF tokens_used + episode_tokens > budget_tokens:
CONTINUE
selected.append(e)
tokens_used ← tokens_used + episode_tokens
IF |selected| ≥ top_k:
BREAK
// Step 5: Record access for frequency tracking
FOR EACH episode e IN selected:
EpisodicIndex.record_access(e.eid, query_context=query)
RETURN selected11.4.3 Episodic Consolidation: Merging, Generalizing, and Forgetting#
Over time, episodic memory grows without bound unless consolidation processes merge, generalize, and forget episodes. Consolidation is analogous to memory consolidation in cognitive science: repeated patterns are abstracted into general knowledge, redundant episodes are merged, and low-value episodes are forgotten.
Merging#
When multiple episodes share similar triggers, contexts, and outcomes, they can be merged into a single consolidated episode:
The merged episode retains:
- The most common trigger pattern.
- A generalized context (intersection of context features).
- A composite action sequence (the most successful action path).
- Aggregated evaluation metrics.
- Provenance links to all source episodes.
Generalization (Promotion to Semantic Memory)#
When a cluster of episodes reveals a consistent pattern, the pattern can be extracted and promoted to semantic memory as a general rule:
For example, if 15 episodes show that "API X returns a 429 error when called more than 100 times per minute," this is promoted to semantic memory as a documented rate limit.
Forgetting#
Episodes are candidates for forgetting when they satisfy all of the following:
- (rarely recalled).
- (sufficiently old).
- (low-value outcome).
- The episode has been subsumed by a merged or generalized record.
The forgetting score is:
Episodes with are moved to cold storage or permanently deleted, depending on retention policy.
Pseudo-Algorithm 11.6: Episodic Consolidation
PROCEDURE ConsolidateEpisodicMemory(M_E, schedule):
// Triggered on schedule (e.g., daily) or when |M_E| exceeds capacity threshold
// Phase 1: Identify merge candidates via clustering
clusters ← ClusterEpisodes(M_E, similarity_threshold=θ_merge, min_cluster_size=3)
FOR EACH cluster C IN clusters:
IF |C| ≥ 3:
merged ← MergeEpisodes(C)
M_E.insert(merged)
FOR EACH e IN C:
e.status ← SUBSUMED
e.subsumed_by ← merged.eid
// Phase 2: Generalization — extract patterns from large clusters
FOR EACH cluster C IN clusters:
IF |C| ≥ pattern_threshold:
pattern ← ExtractPattern(C)
IF PatternIsNovel(pattern, SemanticMemory):
PromoteToSemanticMemory(pattern, provenance=C)
// Phase 3: Forgetting — evict low-value, subsumed episodes
FOR EACH episode e IN M_E:
IF e.status = SUBSUMED:
f_score ← (1 - AccessFreq(e)) * (1 - Recency(e)) * (1 - OutcomeQuality(e))
IF f_score > θ_forget:
IF RetentionPolicy.allows_deletion(e):
ArchiveOrDelete(e)
ConsolidationLog.record(e, action="forgotten")
// Phase 4: Re-index remaining episodes
EpisodicIndex.rebuild(M_E.active_episodes())11.5 Semantic Memory: Canonical Organizational and Domain Knowledge#
Semantic memory () stores authoritative, curated, version-controlled knowledge about the domain, the organization, and the world. It is the agent's structured understanding of entities, relationships, rules, and facts—independent of any specific episode or session. Semantic memory is the highest-authority knowledge layer: when conflicts arise between agent-learned information and semantic memory, semantic memory prevails unless explicitly overridden by a validated correction.
11.5.1 Knowledge Graph Integration: Entity-Relation-Attribute Triples#
Semantic memory is optimally represented as a knowledge graph augmented with dense embeddings, enabling both structured traversal and semantic search.
Triple Model#
The knowledge graph consists of:
- Vertices : entities (persons, systems, concepts, documents, APIs, etc.), each with a typed entity class.
- Edges : directed relations between entities, where is the relation type set.
- Attributes : typed key-value attribute sets on entities and relations.
Each triple is stored as:
where:
- : subject entity.
- : relation type.
- : object entity (or literal value for attribute triples).
- : auxiliary attributes (confidence, scope, etc.).
- : provenance — the source of this knowledge (document URI, episode ID, human curator, etc.).
- : version number.
- : timestamp from which this triple is valid.
- : timestamp at which this triple expires (or for permanent knowledge).
Retrieval over the Knowledge Graph#
Agent queries against semantic memory employ a hybrid strategy:
- Structured query: SPARQL-like traversal for known entity relationships (e.g., "What are the dependencies of ServiceX?").
- Semantic search: Dense-vector similarity search over entity/triple embeddings for fuzzy or exploratory queries (e.g., "What services are affected by database latency?").
- Graph walk: Multi-hop traversal from a seed entity to discover contextually relevant knowledge within hops.
The retrieval score for a knowledge item given query integrates all three signals:
where reflects the provenance quality (human-curated agent-learned inferred).
11.5.2 Ontology Management and Taxonomy Versioning#
The schema governing semantic memory—the ontology—defines the permitted entity types, relation types, and attribute schemas. Ontology management is essential for preventing semantic drift and ensuring interoperability across agent versions.
Ontology Version Control#
The ontology is versioned using semantic versioning:
Version transitions follow strict migration rules:
- Backward-compatible changes (adding optional attributes, new entity types): minor version increment.
- Breaking changes (removing entity types, changing attribute types, altering relation semantics): major version increment with mandatory migration script.
- All knowledge items carry their ontology version: .
Taxonomy Hierarchies#
Entity types and relation types are organized in inheritance hierarchies:
Queries against a parent type automatically include instances of all child types, enabling generalized reasoning without explicit enumeration.
11.5.3 Conflict Resolution Between Agent-Learned and Authoritative Knowledge#
Conflicts arise when an agent's episodic experience suggests a fact that contradicts the authoritative knowledge graph. The conflict resolution protocol operates as follows:
Conflict Detection#
During any memory write or promotion, the system checks for contradictions:
Resolution Hierarchy#
When a conflict is detected, resolution follows a strict authority hierarchy:
- Human-curated knowledge (): always prevails unless a human explicitly revises it.
- Verified automated knowledge (): knowledge derived from automated processes with validation (e.g., CI/CD-verified API schemas).
- Agent-learned knowledge (): knowledge promoted from episodic memory through consolidation.
- Inferred knowledge (): knowledge inferred by the agent during reasoning without direct verification.
Pseudo-Algorithm 11.7: Semantic Memory Conflict Resolution
PROCEDURE ResolveConflict(k_new, k_existing):
// Compare authority levels
IF Authority(k_new) > Authority(k_existing):
// New knowledge is more authoritative
IF k_existing.provenance.type = HUMAN_CURATED:
// Never auto-override human-curated knowledge
ConflictQueue.enqueue(k_new, k_existing, action=HUMAN_REVIEW_REQUIRED)
RETURN DEFERRED
ELSE:
Archive(k_existing, reason="superseded", superseded_by=k_new)
RETURN ACCEPT_NEW
ELSE IF Authority(k_new) = Authority(k_existing):
// Same authority: prefer fresher knowledge
IF k_new.timestamp > k_existing.timestamp AND k_new.confidence > θ_confidence:
Archive(k_existing, reason="superseded_by_newer")
RETURN ACCEPT_NEW
ELSE:
ConflictQueue.enqueue(k_new, k_existing, action=MANUAL_RESOLUTION)
RETURN DEFERRED
ELSE:
// New knowledge is less authoritative — reject or queue
IF k_new.confidence > 0.95 AND k_new.evidence_count > evidence_threshold:
// High-confidence agent knowledge contradicts lower-grade existing
ConflictQueue.enqueue(k_new, k_existing, action=REVIEW_SUGGESTED)
RETURN DEFERRED
ELSE:
ConflictLog.record(k_new, reason="rejected_lower_authority")
RETURN REJECT11.6 Procedural Memory: Learned Action Sequences, Tool Usage Patterns, and Workflow Templates#
Procedural memory () stores reusable, validated, versioned action sequences that the agent has learned to execute. It is the agent's equivalent of "muscle memory": compiled skills that can be invoked reliably without re-deriving them from first principles on every execution.
11.6.1 Procedure Extraction from Successful Execution Traces#
Procedures are not authored manually (though manual authoring is permitted as a bootstrap mechanism). Instead, they are extracted from successful execution traces through a systematic extraction pipeline.
Extraction Pipeline#
Successful Execution Traces
│
▼
┌────────────────────┐
│ Trace Normalization │ ← Canonicalize tool names, parameter formats, error codes
└────────┬───────────┘
▼
┌────────────────────┐
│ Action Abstraction │ ← Replace concrete values with typed parameters
└────────┬───────────┘
▼
┌────────────────────┐
│ Pattern Detection │ ← Identify recurring action subsequences across traces
└────────┬───────────┘
▼
┌────────────────────┐
│ Procedure Synthesis │ ← Compile detected patterns into typed procedure templates
└────────┬───────────┘
▼
┌────────────────────┐
│ Validation & Test │ ← Execute procedure against test cases; verify correctness
└────────┬───────────┘
▼
┌────────────────────┐
│ Registration │ ← Register in procedural memory with version, schema, tests
└────────────────────┘Procedure Schema#
| Field | Type | Description |
|---|---|---|
pid | UUID | Globally unique procedure identifier. |
name | string | Human-readable procedure name. |
version | SemVer | Semantic version of this procedure. |
preconditions | List[Predicate] | Conditions that must hold before execution (tool availability, permissions, state requirements). |
steps | List[ProcedureStep] | Ordered action steps with typed inputs, outputs, branching logic, and retry policies. |
postconditions | List[Predicate] | Conditions that must hold after successful execution (verification checks). |
error_handlers | Map[ErrorClass, RecoveryAction] | Error-class-specific recovery strategies. |
test_suite | List[TestCase] | Automated tests that validate procedure correctness. |
metadata | ProcedureMetadata | Extraction provenance, success rate, average latency, token cost, usage count. |
Each ProcedureStep is typed:
Pseudo-Algorithm 11.8: Procedure Extraction from Traces
PROCEDURE ExtractProcedures(trace_store, min_occurrences, min_success_rate):
// Step 1: Collect successful traces
traces ← trace_store.query(
filter={"outcome": "SUCCESS", "evaluation.correctness": ≥ 0.9},
limit=10000
)
// Step 2: Normalize traces
normalized ← []
FOR EACH trace IN traces:
norm_trace ← NormalizeTrace(trace)
// Canonicalize tool names, abstract concrete parameter values
norm_trace.steps ← [AbstractStep(s) FOR s IN trace.steps]
normalized.append(norm_trace)
// Step 3: Detect recurring patterns (subsequence mining)
patterns ← FrequentSubsequenceMining(
sequences=[t.steps FOR t IN normalized],
min_support=min_occurrences,
min_length=2,
max_gap=2 // allow up to 2 intervening steps
)
// Step 4: Synthesize procedure templates
procedures ← []
FOR EACH pattern p IN patterns:
// Compute success rate of traces containing this pattern
containing_traces ← [t FOR t IN normalized IF Contains(t, p)]
success_rate ← Mean([t.evaluation.correctness FOR t IN containing_traces])
IF success_rate < min_success_rate:
CONTINUE
procedure ← SynthesizeProcedure(
pattern=p,
example_traces=containing_traces,
extract_preconditions=TRUE,
extract_postconditions=TRUE,
extract_error_handlers=TRUE
)
procedure.metadata.success_rate ← success_rate
procedure.metadata.source_trace_count ← |containing_traces|
procedures.append(procedure)
// Step 5: Validate and register
FOR EACH procedure IN procedures:
test_results ← RunProcedureTests(procedure, test_environment)
IF test_results.all_passed:
procedure.status ← CANDIDATE
ProcedureRegistry.register(procedure)
ELSE:
ProcedureLog.record(procedure, status="validation_failed",
failures=test_results.failures)
RETURN procedures11.6.2 Procedure Versioning, Testing, and Promotion#
Procedures follow a lifecycle modeled after software release management:
Lifecycle Stages#
| Stage | Description | Usage |
|---|---|---|
| DRAFT | Newly extracted, not yet validated. | Not available for agent use. |
| CANDIDATE | Passed basic validation; awaiting comprehensive testing. | Available for shadow-mode execution (run but don't trust). |
| STAGED | Passed test suite; awaiting human review or canary deployment. | Available for low-risk tasks. |
| ACTIVE | Fully promoted; trusted for production use. | Primary procedure library. |
| DEPRECATED | Superseded by a newer version; still functional. | Fallback only. |
| ARCHIVED | Removed from active use; retained for audit. | Not executable. |
Version Compatibility#
When a procedure is updated, the new version must demonstrate:
- Backward compatibility: All test cases from the prior version pass (unless explicitly documented as breaking changes).
- Improvement: The new version achieves equal or better success rate on a held-out evaluation set.
- No regression: No previously passing test case fails in the new version.
11.6.3 Procedural Memory as Compiled Agent Skills#
Active procedures serve as pre-compiled skills that the agent can invoke without re-deriving the action sequence. This is analogous to compiled code vs. interpreted code: procedural memory trades flexibility for speed, reliability, and token efficiency.
Invocation Model#
When the agent encounters a task that matches a known procedure's preconditions, the orchestrator can:
- Direct invocation: Execute the procedure step-by-step, skipping the planning phase entirely.
- Guided planning: Use the procedure as a template, allowing the agent to adapt individual steps while following the overall structure.
- Verification-only: Let the agent plan freely but verify the generated plan against known procedures for consistency.
The selection between these modes depends on the confidence score of the match:
Token Efficiency of Procedural Memory#
Procedural invocation yields significant token savings. Instead of including the full reasoning chain in the working memory, the agent references a compact procedure identifier and its parameterized inputs:
Empirically, procedural invocation reduces per-task token consumption by for well-characterized tasks, directly translating to cost and latency improvements.
11.7 Cross-Layer Memory Promotion Policies#
Memory promotion is the controlled movement of information from a lower-durability layer to a higher-durability layer. Promotion is the only authorized mechanism for information to persist beyond its originating layer's lifecycle. Uncontrolled promotion leads to memory bloat, knowledge corruption, and semantic drift; therefore, promotion must be governed by explicit, auditable policies.
11.7.1 Promotion Criteria: Non-Obviousness, Correctness Improvement, Reusability#
Not all information deserves promotion. The promotion policy admits only items that satisfy all of the following criteria:
Non-Obviousness#
The information must not be trivially derivable from existing semantic memory or from the agent's base model knowledge. Formally:
An item is non-obvious if . This prevents the memory system from filling with redundant restatements of known facts.
Correctness Improvement#
The information must demonstrably improve the agent's correctness on future tasks. This is measured by:
In practice, this is estimated by evaluating the item's impact on a held-out task set or by tracking correctness improvements in tasks where the item was retrieved vs. not retrieved.
Reusability#
The information must be applicable across multiple future tasks, not merely specific to the originating session:
Items with are retained in episodic memory (where they serve as specific case studies) rather than promoted to semantic or procedural memory.
Composite Promotion Score#
Promotion is authorized when .
11.7.2 Write Validation: Deduplication, Conflict Detection, Provenance Capture#
Every promoted write must pass through a validation pipeline before admission to the target memory layer.
Pseudo-Algorithm 11.9: Memory Promotion Pipeline
PROCEDURE PromoteMemoryItem(item, source_layer, target_layer):
// Step 1: Compute promotion score
score ← ComputePromotionScore(item)
IF score < θ_promote:
PromotionLog.record(item, action="rejected", reason="below_threshold",
score=score)
RETURN REJECTED
// Step 2: Deduplication check
duplicates ← target_layer.find_similar(
embedding=Embed(item),
threshold=θ_dedup,
limit=5
)
IF |duplicates| > 0:
best_match ← duplicates[0]
IF SemanticEquivalence(item, best_match) > θ_equiv:
// Exact or near-exact duplicate — merge metadata
MergeProvenance(best_match, item)
PromotionLog.record(item, action="deduplicated",
merged_with=best_match.id)
RETURN DEDUPLICATED
// Step 3: Conflict detection
conflicts ← DetectConflicts(item, target_layer)
IF |conflicts| > 0:
resolution ← ResolveConflicts(item, conflicts) // See Pseudo-Alg 11.7
IF resolution = REJECT:
RETURN REJECTED
ELSE IF resolution = DEFERRED:
ConflictQueue.enqueue(item, conflicts)
RETURN DEFERRED
// Step 4: Provenance capture
item.provenance ← ProvenanceRecord(
source_layer=source_layer,
source_id=item.source_id,
promotion_timestamp=now(),
promotion_score=score,
promoting_agent=CurrentAgent.id,
evidence_chain=item.evidence_references
)
// Step 5: Schema validation
IF NOT target_layer.schema.validate(item):
PromotionLog.record(item, action="rejected", reason="schema_violation")
RETURN REJECTED
// Step 6: Write with TTL and version
item.version ← 1
item.ttl ← target_layer.default_ttl
item.created_at ← now()
target_layer.write(item)
PromotionLog.record(item, action="promoted", target=target_layer.name)
RETURN PROMOTED11.7.3 Expiry Policies: TTL, Access-Frequency Decay, Relevance Recalculation#
Every memory item in every persistent layer has an expiry policy that governs its lifecycle. The expiry system prevents unbounded memory growth and ensures that stale information is eventually removed or refreshed.
TTL Assignment#
Each layer has a default TTL, which may be overridden at the item level based on the item's characteristics:
| Layer | Default TTL | Override Basis |
|---|---|---|
| Session () | Session duration () | User-configured session timeout. |
| Episodic () | 90 days | Adjusted by outcome quality and access frequency. |
| Semantic () | (permanent) | Explicit expiry for time-bound knowledge (e.g., API versions). |
| Procedural () | (permanent) | Deprecated when superseded. |
Access-Frequency Decay#
Items that are never accessed become candidates for eviction even before their TTL expires. The effective TTL is reduced by a decay factor based on access frequency:
where is the maximum decay fraction and is the expected access count. Items with zero accesses experience the maximum TTL reduction.
Periodic Relevance Recalculation#
A background process periodically re-evaluates the relevance of all persistent memory items against a representative sample of recent tasks:
Items whose relevance drops below are flagged for review. If unflagged for multiple review cycles, they are demoted to cold storage.
Pseudo-Algorithm 11.10: Memory Expiry Sweep
PROCEDURE RunExpirySweep(memory_layer, recent_queries):
FOR EACH item m IN memory_layer.all_items():
// Check hard TTL
IF now() > m.created_at + m.ttl_effective:
Expire(m, reason="ttl_exceeded")
CONTINUE
// Check access-frequency decay
m.ttl_effective ← m.ttl_base * (1 - η * exp(-m.access_count / μ))
IF now() > m.created_at + m.ttl_effective:
Expire(m, reason="access_decay_expiry")
CONTINUE
// Check relevance staleness
relevance ← MeanSimilarity(Embed(m), [Embed(q) FOR q IN recent_queries])
IF relevance < θ_stale:
m.staleness_flags ← m.staleness_flags + 1
IF m.staleness_flags > max_staleness_flags:
DemoteToColdStorage(m, reason="relevance_decay")
ELSE:
m.staleness_flags ← 0 // Reset if still relevant
ExpiryLog.record_sweep(layer=memory_layer.name,
expired_count=expired,
demoted_count=demoted,
active_count=active)11.8 Memory Wall Enforcement: Isolation Mechanisms Between Agent Instances and Layers#
The memory wall is not merely a design guideline; it must be mechanically enforced at multiple levels of the system architecture. Violations of memory isolation—whether through shared mutable state, context leakage, or unauthorized cross-layer reads—are treated as system invariant violations, equivalent to memory corruption in systems programming.
11.8.1 Enforcement Architecture#
┌──────────────────────────────────────────────────────────────┐
│ AGENT RUNTIME │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Agent A │ │ Agent B │ │ Agent C │ │ Agent D │ │
│ │ (sid=1) │ │ (sid=2) │ │ (sid=1) │ │ (sid=3) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ ┌────▼──────────────▼──────────────▼──────────────▼─────┐ │
│ │ MEMORY ACCESS LAYER (MAL) │ │
│ │ ┌────────────────────────────────────────────────┐ │ │
│ │ │ Policy Enforcement Point (PEP) │ │ │
│ │ │ • Caller identity verification │ │ │
│ │ │ • Session scope validation │ │ │
│ │ │ • Layer access authorization │ │ │
│ │ │ • Read/write budget enforcement │ │ │
│ │ │ • Provenance injection │ │ │
│ │ └────────────────────────────────────────────────┘ │ │
│ └───────┬──────────┬──────────┬──────────┬───────────────┘ │
│ │ │ │ │ │
│ ┌────▼───┐ ┌────▼───┐ ┌────▼───┐ ┌────▼───┐ ┌────────┐ │
│ │ M_W │ │ M_S │ │ M_E │ │ M_K │ │ M_P │ │
│ │(local) │ │(scoped)│ │(shared)│ │(shared)│ │(shared)│ │
│ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │
└──────────────────────────────────────────────────────────────┘11.8.2 Isolation Rules#
The Memory Access Layer (MAL) enforces the following invariants:
Rule 1: Working Memory Isolation
Working memory is strictly process-local. No agent can read or write another agent's working memory. This is enforced by in-process memory isolation (separate heap allocations or namespaces).
Rule 2: Session Memory Scope Binding
All session memory reads and writes are gated on session-identifier matching. The MAL rejects any request where the caller's session ID does not match the target session partition.
Rule 3: Episodic Memory Read Authorization
Episodic memory is readable by agents of the same class (or authorized classes), but write access requires validated promotion.
Rule 4: Semantic Memory Write Gate
No agent can directly write to semantic memory. All writes must pass through the promotion pipeline (Pseudo-Algorithm 11.9).
Rule 5: Procedural Memory Execution Gate
11.8.3 Cross-Layer Contamination Detection#
The MAL includes a contamination detector that monitors for anomalous access patterns indicative of isolation violations:
- Session cross-read detection: Alert if an agent attempts to read session memory with a mismatched session ID.
- Working memory persistence detection: Alert if working memory items appear in subsequent sessions or agent instances (indicating they were improperly persisted).
- Unpromoted semantic writes: Alert if knowledge items appear in semantic memory without corresponding promotion pipeline records.
- Provenance chain validation: Periodically verify that all items in durable layers have complete provenance chains tracing back to their originating source.
Pseudo-Algorithm 11.11: Contamination Detection Sweep
PROCEDURE DetectContamination():
alerts ← []
// Check 1: Session memory partition integrity
FOR EACH session_partition IN SessionStore.partitions():
items ← session_partition.all_items()
FOR EACH item IN items:
IF item.provenance.sid ≠ session_partition.sid:
alerts.append(Alert(
type="CROSS_SESSION_CONTAMINATION",
severity=CRITICAL,
details={partition: session_partition.sid,
item_sid: item.provenance.sid}
))
// Check 2: Orphaned semantic memory (no provenance)
FOR EACH item IN SemanticMemory.all_items():
IF item.provenance IS NULL OR NOT ProvenanceChainValid(item):
alerts.append(Alert(
type="ORPHANED_KNOWLEDGE",
severity=HIGH,
details={item_id: item.id, missing="provenance_chain"}
))
// Check 3: Working memory leakage into persistent stores
recent_wm_hashes ← WorkingMemoryAuditLog.recent_content_hashes(window=24h)
FOR EACH layer IN [EpisodicMemory, SemanticMemory]:
FOR EACH item IN layer.recently_written(window=24h):
IF ContentHash(item) IN recent_wm_hashes:
IF NOT PromotionLog.has_record(item.id):
alerts.append(Alert(
type="UNPROMOTED_WM_LEAKAGE",
severity=CRITICAL,
details={item_id: item.id, layer: layer.name}
))
RETURN alerts11.9 Memory Observability: Usage Analytics, Hit Rates, Staleness Metrics, and Audit Logs#
A memory system that cannot be observed cannot be optimized, debugged, or trusted. Memory observability provides the instrumentation necessary to understand how memory layers are performing, where bottlenecks exist, and whether the system's knowledge is accurate and current.
11.9.1 Core Metrics#
The observability layer tracks the following metrics per memory layer, emitted as structured telemetry:
Capacity and Utilization#
| Metric | Definition | Unit |
|---|---|---|
memory.{layer}.utilization | Ratio | |
memory.{layer}.item_count | Number of items in layer | Count |
memory.{layer}.token_count | Total tokens stored | Tokens |
memory.{layer}.growth_rate | Items/hour |
Access Patterns#
| Metric | Definition | Unit |
|---|---|---|
memory.{layer}.read_count | Total reads per time window | Count/min |
memory.{layer}.write_count | Total writes per time window | Count/min |
memory.{layer}.hit_rate | Ratio | |
memory.{layer}.miss_rate | Ratio | |
memory.{layer}.read_latency_p50 | Median read latency | ms |
memory.{layer}.read_latency_p99 | 99th percentile read latency | ms |
Quality and Freshness#
| Metric | Definition | Unit |
|---|---|---|
memory.{layer}.staleness_ratio | Ratio | |
memory.{layer}.avg_item_age | Mean age of items | Hours |
memory.{layer}.relevance_score | Mean relevance of retrieved items to recent queries | |
memory.{layer}.recall_utilization | Fraction of retrieved items actually used in generation | Ratio |
Promotion and Eviction#
| Metric | Definition | Unit |
|---|---|---|
memory.promotion.{src}_{dst}.count | Promotions from source to destination layer | Count/hour |
memory.promotion.{src}_{dst}.rejection_rate | Fraction of promotion attempts rejected | Ratio |
memory.eviction.{layer}.count | Items evicted per time window | Count/hour |
memory.eviction.{layer}.reason_distribution | Breakdown by eviction reason (TTL, decay, pruning, manual) | Distribution |
11.9.2 Derived Diagnostics#
From the core metrics, the following diagnostic signals are computed:
Memory Efficiency Ratio#
This ratio measures how efficiently the memory system converts stored information into useful context. A high utilization with low recall utilization indicates bloat; a low hit rate indicates retrieval quality issues.
Staleness Risk Score#
High indicates that the memory layer is full of stale, irrelevant content—a condition that degrades retrieval quality and wastes token budget.
Context Pollution Index#
CPI measures the fraction of retrieved memory items that did not contribute to the task outcome. A CPI approaching 1.0 indicates severe context pollution.
11.9.3 Audit Logging#
Every memory operation is recorded in an append-only audit log with the following schema:
where .
Audit logs serve three critical functions:
- Compliance: Demonstrate that memory operations adhere to data governance policies (retention, PII handling, access control).
- Debugging: Trace the provenance of any knowledge item back to its origin, through all promotions, merges, and modifications.
- Optimization: Analyze access patterns to tune retrieval strategies, eviction policies, and token budget allocations.
11.9.4 Observability Dashboard Structure#
┌─────────────────────────────────────────────────────────────┐
│ MEMORY OBSERVABILITY DASHBOARD │
├─────────────┬─────────────┬─────────────┬──────────┬────────┤
│ Working │ Session │ Episodic │ Semantic │ Proced │
│ Memory │ Memory │ Memory │ Memory │ Memory │
├─────────────┼─────────────┼─────────────┼──────────┼────────┤
│ Util: 62% │ Active: 47 │ Items: 12K │ Triples: │ Procs: │
│ Overflow: 3 │ Avg Turns: │ Hit Rate: │ 245K │ 89 │
│ GC/min: 12 │ 18.4 │ 0.73 │ Hit: 0.91│ Active:│
│ Carry: 22% │ Ckpts: 142 │ Staleness: │ Stale: │ 67 │
│ │ Contam: 0 │ 0.12 │ 0.04 │ SRate: │
│ │ │ CPI: 0.18 │ Confl: 3 │ 0.94 │
├─────────────┴─────────────┴─────────────┴──────────┴────────┤
│ PROMOTION FLOW │
│ M_W ──► M_S: 8.2/hr M_S ──► M_E: 1.4/hr │
│ M_E ──► M_K: 0.3/hr M_E ──► M_P: 0.1/hr │
│ Rejection Rate: 34% Conflict Rate: 2.1% │
├──────────────────────────────────────────────────────────────┤
│ ALERTS │
│ ⚠ Episodic Memory staleness rising (0.12 → 0.18, 7d trend)│
│ ⚠ Working memory overflow rate above threshold (3 > 2/min) │
│ ✓ No cross-session contamination detected │
│ ✓ All provenance chains validated │
└──────────────────────────────────────────────────────────────┘11.9.5 Automated Alerting and Self-Healing#
The observability system triggers automated responses based on metric thresholds:
Pseudo-Algorithm 11.12: Memory Health Monitor
PROCEDURE MonitorMemoryHealth():
LOOP every monitoring_interval:
metrics ← CollectAllMemoryMetrics()
// Alert 1: Memory layer approaching capacity
FOR EACH layer IN MemoryLayers:
IF metrics[layer].utilization > 0.85:
TriggerAlert("CAPACITY_WARNING", layer,
message=f"{layer} at {metrics[layer].utilization*100}% capacity")
IF layer.auto_eviction_enabled:
TriggerEvictionSweep(layer, target_utilization=0.70)
// Alert 2: Hit rate degradation
FOR EACH layer IN [Episodic, Semantic]:
IF metrics[layer].hit_rate < θ_min_hit_rate:
TriggerAlert("HIT_RATE_DEGRADATION", layer)
ScheduleReindexing(layer)
// Alert 3: Staleness accumulation
FOR EACH layer IN [Episodic, Semantic]:
IF metrics[layer].staleness_ratio > θ_max_staleness:
TriggerAlert("STALENESS_ACCUMULATION", layer)
ScheduleExpirySweep(layer)
// Alert 4: Context pollution
IF metrics.global.CPI > θ_max_CPI:
TriggerAlert("CONTEXT_POLLUTION", severity=HIGH)
ScheduleRetrievalTuning()
// Alert 5: Promotion pipeline backup
IF ConflictQueue.size() > θ_max_conflict_queue:
TriggerAlert("PROMOTION_PIPELINE_BACKUP", severity=MEDIUM)
NotifyHumanReviewers(ConflictQueue.pending())
// Alert 6: Cross-session contamination
contamination_alerts ← DetectContamination()
IF |contamination_alerts| > 0:
FOR EACH alert IN contamination_alerts:
TriggerAlert(alert.type, severity=CRITICAL)
IF alert.type = "CROSS_SESSION_CONTAMINATION":
QuarantineAffectedSessions(alert.details)
EmitHealthReport(metrics)11.9.6 Operational Implications#
The observability infrastructure enables the following operational capabilities:
- Capacity planning: Trend analysis on growth rates and utilization enables proactive scaling of memory backends before capacity exhaustion.
- Retrieval quality tuning: Hit rate, recall utilization, and CPI metrics directly inform adjustments to embedding models, chunking strategies, and ranking weights.
- Eviction policy calibration: Access-frequency distributions and staleness trends guide TTL and decay parameter optimization.
- Cost attribution: Token-level tracking of memory retrieval costs enables per-task and per-layer cost attribution, supporting budget governance.
- Compliance auditing: Complete audit trails with provenance chains support regulatory and organizational compliance requirements for data handling, retention, and access control.
Summary: The Memory Hierarchy as a Production System#
The five-layer memory hierarchy—working, session, episodic, semantic, and procedural—constitutes a complete, typed, mechanically enforced knowledge management subsystem for agentic AI. The architecture is governed by the following invariants:
| Invariant | Enforcement Mechanism |
|---|---|
| Layer isolation | Memory Access Layer with typed access policies |
| Promotion validity | Validated pipeline with deduplication, conflict resolution, provenance |
| Capacity bounds | Token budgets, overflow handlers, eviction sweeps |
| Knowledge freshness | TTL, access-frequency decay, relevance recalculation |
| Observability | Structured metrics, audit logs, automated alerting |
| Session privacy | Namespace partitioning, contamination detection |
| Procedural reliability | Version-controlled lifecycle with test gates |
The memory wall is not a suggestion; it is an architectural boundary enforced through typed contracts, storage-level partitioning, runtime validation, and continuous monitoring. Agents that operate within this discipline achieve predictable, auditable, and continuously improving performance. Agents that violate the memory wall degrade unpredictably and fail silently—the worst operational outcome in a production system.