Agentic Notes Library

Chapter 17: Team Coordination — World-Class Agent Team Dynamics

Multi-agent coordination transcends the orchestration of isolated tool-calling loops. When agents operate as a team , the system acquires emergent properties—collective reasoning capacity, fault tolerance through redundancy, specializati...

March 20, 2026 19 min read 4,055 words
Chapter 17Math

Preamble#

Multi-agent coordination transcends the orchestration of isolated tool-calling loops. When agents operate as a team, the system acquires emergent properties—collective reasoning capacity, fault tolerance through redundancy, specialization-driven throughput—that no single-agent architecture can replicate. However, these gains materialize only when the coordination substrate enforces explicit role contracts, bounded communication protocols, verified handoffs, shared memory discipline, and measurable team-level quality gates. This chapter formalizes agent team dynamics as an engineering discipline: typed organizational structures, mathematically grounded consensus and conflict resolution, provenance-tracked shared memory, adaptive composition under runtime evolution, and production-grade reliability patterns drawn from High-Reliability Organizations (HROs). Every mechanism is specified at protocol level with pseudo-algorithms, formal objective functions, and architecture trade-off analysis suitable for enterprise-scale deployment.


17.1 Agent Teams as Organizational Units: Roles, Responsibilities, and Accountability#

17.1.1 Foundational Abstractions#

An agent team T\mathcal{T} is a bounded set of nn agents operating under a shared task mandate with explicit role assignments, communication topology, and an accountability ledger:

T=A,R,Φ,G,Γ,L\mathcal{T} = \langle \mathcal{A}, \mathcal{R}, \Phi, \mathcal{G}, \Gamma, \mathcal{L} \rangle

where:

  • A={a1,a2,,an}\mathcal{A} = \{a_1, a_2, \dots, a_n\} — the agent pool
  • R={r1,r2,,rk}\mathcal{R} = \{r_1, r_2, \dots, r_k\} — the role taxonomy
  • Φ:A2R\Phi : \mathcal{A} \to 2^{\mathcal{R}} — the role assignment function (an agent may hold multiple roles)
  • G\mathcal{G} — the shared goal specification (a typed task DAG or objective tree)
  • ΓA×A\Gamma \subseteq \mathcal{A} \times \mathcal{A} — the communication topology (directed graph of permitted message channels)
  • L\mathcal{L} — the accountability ledger (append-only trace of decisions, outputs, and responsibility attributions)

17.1.2 Role Taxonomy#

Roles are not informal labels. Each role riRr_i \in \mathcal{R} is a typed contract:

ri=name,  capabilities:Ci,  authority:Pi,  obligations:Oi,  constraints:Kir_i = \langle \text{name}, \; \text{capabilities}: \mathcal{C}_i, \; \text{authority}: \mathcal{P}_i, \; \text{obligations}: \mathcal{O}_i, \; \text{constraints}: \mathcal{K}_i \rangle
Role DimensionSpecification
Capabilities Ci\mathcal{C}_iSet of tools, model variants, retrieval indices, and output modalities the role may invoke
Authority Pi\mathcal{P}_iMutation scope: which state domains, artifacts, and external systems this role may modify
Obligations Oi\mathcal{O}_iMandatory outputs, verification checks, and reporting duties per execution cycle
Constraints Ki\mathcal{K}_iToken budgets, latency ceilings, recursion depth bounds, and approval gates

Canonical role archetypes in production agentic teams:

  1. Planner / Orchestrator — decomposes goals into sub-tasks, assigns work, manages DAG state
  2. Implementer / Executor — performs domain-specific generation, transformation, or computation
  3. Retriever / Analyst — executes hybrid retrieval, ranks evidence, surfaces provenance
  4. Verifier / Critic — validates outputs against specifications, detects hallucinations, runs test harnesses
  5. Documenter / Synthesizer — aggregates partial results, produces coherent deliverables
  6. Monitor / Sentinel — observes system health, enforces rate limits, triggers escalation

17.1.3 Accountability Ledger#

Every action within T\mathcal{T} is recorded in the accountability ledger L\mathcal{L} as a structured entry:

=timestamp,  agent_id,  role,  action_type,  input_hash,  output_hash,  decision_rationale,  verification_status\ell = \langle \text{timestamp}, \; \text{agent\_id}, \; \text{role}, \; \text{action\_type}, \; \text{input\_hash}, \; \text{output\_hash}, \; \text{decision\_rationale}, \; \text{verification\_status} \rangle

The ledger is:

  • Append-only — no retroactive mutation, ensuring auditability
  • Content-addressed — input and output hashes enable deterministic replay
  • Causally ordered — Lamport timestamps or vector clocks maintain happens-before relations across concurrent agents
  • Queryable — supports provenance tracing: "Which agent produced artifact xx? Under what evidence?"

17.1.4 Responsibility Attribution Model#

When a team output is incorrect, the system must attribute responsibility to enable targeted repair:

Responsibility(ai,failure)=causal_contribution(ai,failure)j=1ncausal_contribution(aj,failure)\text{Responsibility}(a_i, \text{failure}) = \frac{\text{causal\_contribution}(a_i, \text{failure})}{\sum_{j=1}^{n} \text{causal\_contribution}(a_j, \text{failure})}

Causal contribution is computed via the ledger by tracing the dependency chain from the failure artifact back through all contributing actions, weighted by each agent's decision authority at the relevant branching points. This drives:

  • Targeted re-execution — only the causally responsible subtree is re-planned
  • Capability scoring — persistent per-agent reliability metrics inform future role assignment
  • Escalation triggers — repeated attribution to a single agent triggers model swap, parameter adjustment, or human escalation

17.2 Team Formation Strategies: Static Assignment, Dynamic Assembly, and Capability-Based Matching#

17.2.1 Formation Strategy Taxonomy#

StrategyWhen to UseTrade-offs
Static AssignmentStable, well-understood task domains; compliance-sensitive environmentsLow overhead, deterministic behavior; inflexible to novel task types
Dynamic AssemblyHeterogeneous, evolving workloads; multi-domain requestsAdaptive, optimal specialization; higher formation latency, coordination cost
Capability-Based MatchingLarge agent pools with diverse specializations; marketplace architecturesPrecise skill-task alignment; requires maintained capability registry, scoring infrastructure

17.2.2 Static Assignment#

The team structure is defined at design time. A configuration manifest specifies:

TeamManifest={(ai,rj,model_id,tool_set,resource_quota)  |  i[1,n],  j[1,k]}\text{TeamManifest} = \left\{ (a_i, r_j, \text{model\_id}, \text{tool\_set}, \text{resource\_quota}) \;\middle|\; i \in [1,n], \; j \in [1,k] \right\}

Static teams are versioned artifacts deployed through CI/CD. Changes require manifest updates, regression testing, and rollout procedures. This is the preferred strategy for regulated domains (medical, financial, legal) where audit determinism is paramount.

17.2.3 Dynamic Assembly#

At task ingestion time, a formation controller selects agents from a pool based on task analysis:

Pseudo-Algorithm 17.1 — Dynamic Team Assembly

PROCEDURE AssembleTeam(task, agent_pool, formation_policy):
    // Phase 1: Task Analysis
    task_graph ← DecomposeTask(task)
    required_capabilities ← ExtractCapabilityRequirements(task_graph)
    required_roles ← MapCapabilitiesToRoles(required_capabilities)
 
    // Phase 2: Candidate Selection
    candidates ← {}
    FOR EACH role IN required_roles:
        eligible ← FILTER agent_pool WHERE:
            agent.capabilities ⊇ role.capabilities AND
            agent.current_load < agent.capacity_limit AND
            agent.reliability_score ≥ formation_policy.min_reliability
        ranked ← SORT eligible BY CapabilityMatchScore(agent, role) DESC
        candidates[role] ← ranked[0 : formation_policy.candidates_per_role]
 
    // Phase 3: Team Optimization
    team ← SolveAssignment(candidates, required_roles, formation_policy.constraints)
    // Constraints include: budget ceiling, latency target, diversity requirements,
    // anti-affinity rules (e.g., verifier ≠ implementer model family)
 
    // Phase 4: Topology Construction
    topology ← BuildCommunicationGraph(team, task_graph)
    
    // Phase 5: Initialization
    FOR EACH (agent, role) IN team:
        InitializeAgentContext(agent, role, task_graph, topology)
    
    RETURN TeamInstance(team, topology, task_graph)

17.2.4 Capability-Based Matching: Formal Model#

Each agent aia_i publishes a capability vector ciRd\mathbf{c}_i \in \mathbb{R}^d spanning dd skill dimensions (e.g., code generation quality, retrieval precision, mathematical reasoning, domain expertise scores). Each role rjr_j defines a requirement vector qjRd\mathbf{q}_j \in \mathbb{R}^d with minimum thresholds τjRd\boldsymbol{\tau}_j \in \mathbb{R}^d.

Match score:

Match(ai,rj)={k=1dwk(ci,kτj,k)if ciτj (element-wise)otherwise\text{Match}(a_i, r_j) = \begin{cases} \displaystyle\sum_{k=1}^{d} w_k \cdot \left( c_{i,k} - \tau_{j,k} \right) & \text{if } \mathbf{c}_i \geq \boldsymbol{\tau}_j \text{ (element-wise)} \\ -\infty & \text{otherwise} \end{cases}

where wkw_k are importance weights derived from task analysis.

Optimal Assignment is modeled as a constrained optimization:

maxX{0,1}n×ki=1nj=1kXijMatch(ai,rj)\max_{\mathbf{X} \in \{0,1\}^{n \times k}} \sum_{i=1}^{n} \sum_{j=1}^{k} X_{ij} \cdot \text{Match}(a_i, r_j)

subject to:

j=1kXijmax_roles(ai)i(agent capacity)\sum_{j=1}^{k} X_{ij} \leq \text{max\_roles}(a_i) \quad \forall i \quad \text{(agent capacity)} i=1nXijmin_agents(rj)j(role coverage)\sum_{i=1}^{n} X_{ij} \geq \text{min\_agents}(r_j) \quad \forall j \quad \text{(role coverage)} i=1nj=1kXijcost(ai)B(budget constraint)\sum_{i=1}^{n} \sum_{j=1}^{k} X_{ij} \cdot \text{cost}(a_i) \leq B \quad \text{(budget constraint)} anti_affinity(ai,ai)    ¬(Xij=1Xij=1)(diversity constraints)\text{anti\_affinity}(a_i, a_{i'}) \implies \neg (X_{ij} = 1 \wedge X_{i'j} = 1) \quad \text{(diversity constraints)}

This is an instance of a generalized assignment problem, solvable via ILP for small teams or greedy heuristics with bounded approximation ratios for large pools.

17.2.5 Capability Registry#

The capability registry is a typed, versioned data store:

CapabilityRegistry:AgentID{capability_vector,  benchmark_scores,  reliability_history,  cost_profile,  latency_profile,  version}\text{CapabilityRegistry} : \text{AgentID} \to \left\{ \text{capability\_vector}, \; \text{benchmark\_scores}, \; \text{reliability\_history}, \; \text{cost\_profile}, \; \text{latency\_profile}, \; \text{version} \right\}

Capability vectors are updated through:

  • Benchmark evaluation — periodic execution against standardized task suites
  • Production telemetry — rolling accuracy, latency, and failure rates from live traces
  • Peer assessment — verifier agents' acceptance rates for each implementer's outputs

17.3 Shared Mental Models: Establishing Common Context, Goals, and Constraints Across Agents#

17.3.1 Definition and Motivation#

A shared mental model (SMM) in multi-agent systems is a synchronized, bounded representation of the team's collective understanding of:

  1. Task state — current progress, pending sub-tasks, completed artifacts, blocking dependencies
  2. Environment state — relevant external system states, data freshness, resource availability
  3. Team state — who is doing what, agent health, capacity utilization
  4. Constraint state — active policies, budget remaining, deadline proximity, quality gates

Without explicit SMM construction, agents operate on divergent assumptions, producing inconsistent outputs, redundant work, or conflicting mutations.

17.3.2 Formal Representation#

Define the shared mental model at time tt as:

M(t)=Stask(t),  Senv(t),  Steam(t),  Sconstraint(t)\mathcal{M}(t) = \langle \mathcal{S}_{\text{task}}(t), \; \mathcal{S}_{\text{env}}(t), \; \mathcal{S}_{\text{team}}(t), \; \mathcal{S}_{\text{constraint}}(t) \rangle

Each component is a typed, versioned, causally consistent data structure with monotonically increasing version counters:

Stask(t)={(subtask_id,  status,  assignee,  dependencies,  artifacts,  version)}\mathcal{S}_{\text{task}}(t) = \left\{ (\text{subtask\_id}, \; \text{status}, \; \text{assignee}, \; \text{dependencies}, \; \text{artifacts}, \; \text{version}) \right\}

17.3.3 SMM Construction Pipeline#

Pseudo-Algorithm 17.2 — Shared Mental Model Construction

PROCEDURE ConstructSMM(team, task_graph, environment_state):
    // Phase 1: Goal Alignment
    goal_specification ← ExtractGoalTree(task_graph)
    success_criteria ← ExtractMeasurableCriteria(goal_specification)
    FOR EACH agent IN team:
        InjectGoalContext(agent, goal_specification, success_criteria)
    
    // Phase 2: Task State Synchronization
    task_state ← InitializeTaskDAG(task_graph)
    PublishToSharedState("task_state", task_state, version=0)
    
    // Phase 3: Constraint Broadcasting
    constraints ← {
        token_budget_remaining: ComputeRemainingBudget(team),
        deadline: task.deadline,
        quality_gates: LoadQualityGates(task.domain),
        authority_matrix: LoadAuthorityMatrix(team),
        escalation_policy: LoadEscalationPolicy(team)
    }
    PublishToSharedState("constraints", constraints, version=0)
    
    // Phase 4: Environment Snapshot
    env_snapshot ← CaptureEnvironmentState(environment_state)
    PublishToSharedState("environment", env_snapshot, version=0)
    
    // Phase 5: Team Roster
    roster ← {}
    FOR EACH (agent, role) IN team:
        roster[agent.id] ← {
            role: role, 
            capabilities: agent.capabilities,
            status: IDLE, 
            capacity: agent.remaining_capacity
        }
    PublishToSharedState("team_roster", roster, version=0)
    
    RETURN SMM(task_state, env_snapshot, constraints, roster)

17.3.4 SMM Synchronization Protocol#

Maintaining consistency across agents requires a synchronization discipline:

  1. State Channel — A shared, versioned key-value store (e.g., a lightweight coordination service) accessible by all team members through typed read/write interfaces
  2. Optimistic Concurrency — Writes carry version numbers; the state channel rejects stale writes (compare-and-swap semantics)
  3. Event Propagation — State changes emit typed events; agents subscribe to relevant channels
Write(key,value,vexpected){Success(vnew)if vcurrent=vexpectedConflictError(vcurrent)otherwise\text{Write}(\text{key}, \text{value}, v_{\text{expected}}) \to \begin{cases} \text{Success}(v_{\text{new}}) & \text{if } v_{\text{current}} = v_{\text{expected}} \\ \text{ConflictError}(v_{\text{current}}) & \text{otherwise} \end{cases}
  1. Bounded Staleness — Agents may operate on slightly stale state within a tolerance window δstale\delta_{\text{stale}}; critical decisions require fresh reads
Consistency Requirement:  treadtlast_writeδstale\text{Consistency Requirement}: \; |t_{\text{read}} - t_{\text{last\_write}}| \leq \delta_{\text{stale}}

17.3.5 Context Injection for Model-Based Agents#

Since LLM-based agents consume context through token windows, the SMM must be compiled into an efficient context payload:

SMM_Payload(ai,t)=Compile(Role(ai),  TaskSlice(ai,t),  TeamSummary(t),  ConstraintSnapshot(t))\text{SMM\_Payload}(a_i, t) = \text{Compile}\left( \text{Role}(a_i), \; \text{TaskSlice}(a_i, t), \; \text{TeamSummary}(t), \; \text{ConstraintSnapshot}(t) \right)

where TaskSlice\text{TaskSlice} includes only the sub-tasks relevant to agent aia_i's current assignment, compressed to stay within token budget BaicontextB_{a_i}^{\text{context}}:

SMM_Payload(ai,t)BaicontextBaitoolsBairetrievalBaigeneration|\text{SMM\_Payload}(a_i, t)| \leq B_{a_i}^{\text{context}} - B_{a_i}^{\text{tools}} - B_{a_i}^{\text{retrieval}} - B_{a_i}^{\text{generation}}

17.4 Handoff Protocols: Clean State Transfer, Context Summarization, and Responsibility Chain#

17.4.1 The Handoff Problem#

A handoff occurs when responsibility for a work unit transfers from agent aia_i (sender) to agent aja_j (receiver). Failed handoffs are the single largest source of coordination errors in multi-agent systems: dropped context, duplicated work, inconsistent state, and untracked responsibility gaps.

17.4.2 Handoff Packet Specification#

Every handoff transmits a typed handoff packet H\mathcal{H}:

H=task_id,  sender,  receiver,  artifacts,  context_summary,  open_issues,  constraints_remaining,  provenance_chain,  handoff_timestamp,  ack_deadline\mathcal{H} = \langle \text{task\_id}, \; \text{sender}, \; \text{receiver}, \; \text{artifacts}, \; \text{context\_summary}, \; \text{open\_issues}, \; \text{constraints\_remaining}, \; \text{provenance\_chain}, \; \text{handoff\_timestamp}, \; \text{ack\_deadline} \rangle
FieldDescription
artifactsVersioned, content-addressed outputs produced by sender
context_summaryCompressed representation of sender's working context, decisions made, and rationale
open_issuesUnresolved questions, known risks, deferred decisions requiring receiver attention
constraints_remainingResidual budget (tokens, time, cost), quality gates not yet satisfied
provenance_chainOrdered list of all prior agents and actions that contributed to current state

17.4.3 Context Summarization for Handoffs#

The sender must compress its working context into a summary that preserves decision-critical information while fitting within the receiver's token budget. This is a lossy compression problem with a fidelity objective:

summary=argminsΣ  Linfo(full_context,s)s.t.sBsummary\text{summary}^* = \arg\min_{s \in \Sigma} \; \mathcal{L}_{\text{info}}(\text{full\_context}, s) \quad \text{s.t.} \quad |s| \leq B_{\text{summary}}

where Linfo\mathcal{L}_{\text{info}} is an information loss function (approximated by evaluating whether downstream tasks succeed with ss versus full context on held-out examples).

Pseudo-Algorithm 17.3 — Handoff Context Summarization

PROCEDURE SummarizeForHandoff(working_context, task_spec, budget):
    // Step 1: Identify decision-critical elements
    decisions ← ExtractDecisions(working_context)
    constraints ← ExtractActiveConstraints(working_context)
    unresolved ← ExtractOpenQuestions(working_context)
    artifacts ← ExtractOutputArtifacts(working_context)
    
    // Step 2: Rank by downstream utility
    elements ← decisions ∪ constraints ∪ unresolved ∪ ArtifactSummaries(artifacts)
    FOR EACH element IN elements:
        element.priority ← ScoreDownstreamUtility(element, task_spec)
    
    ranked ← SORT elements BY priority DESC
    
    // Step 3: Greedy packing under budget
    summary ← []
    token_count ← 0
    FOR EACH element IN ranked:
        element_tokens ← CountTokens(Serialize(element))
        IF token_count + element_tokens ≤ budget:
            summary.APPEND(element)
            token_count ← token_count + element_tokens
    
    // Step 4: Structural formatting
    RETURN FormatHandoffSummary(summary, task_spec)

17.4.4 Handoff Protocol State Machine#

The handoff follows a strict three-phase commit:

Sender: PREPAREHReceiver: VALIDATEACK/NACKSender: RELEASE/RETRY\text{Sender: PREPARE} \xrightarrow{\mathcal{H}} \text{Receiver: VALIDATE} \xrightarrow{\text{ACK/NACK}} \text{Sender: RELEASE/RETRY}

Phase 1 — PREPARE: Sender constructs H\mathcal{H}, locks the work unit, publishes handoff intent to the coordination service.

Phase 2 — VALIDATE: Receiver inspects H\mathcal{H}, verifies:

  • Artifact integrity (hash verification)
  • Context sufficiency (receiver can identify its next action)
  • Constraint feasibility (remaining budget and deadline are achievable)

Phase 3 — COMMIT or RETRY:

  • On ACK: Sender releases the lock; receiver assumes ownership; accountability ledger records transfer
  • On NACK (with reason): Sender supplements missing context, re-summarizes, or escalates
Handoff Reliability=successful handoffsattempted handoffs1ϵhandoff\text{Handoff Reliability} = \frac{|\text{successful handoffs}|}{|\text{attempted handoffs}|} \geq 1 - \epsilon_{\text{handoff}}

where ϵhandoff\epsilon_{\text{handoff}} is a system-level SLO (typically 0.01\leq 0.01).

17.4.5 Responsibility Chain#

The responsibility chain is the ordered sequence of agents that have held ownership of a work unit:

ResponsibilityChain(task_id)=[(ai1,t1,t2),  (ai2,t2,t3),  ,  (aim,tm,tcompletion)]\text{ResponsibilityChain}(\text{task\_id}) = [(a_{i_1}, t_1, t_2), \; (a_{i_2}, t_2, t_3), \; \dots, \; (a_{i_m}, t_m, t_{\text{completion}})]

This chain is immutable, stored in L\mathcal{L}, and serves three purposes:

  1. Failure forensics — trace output errors to the responsible ownership interval
  2. Latency attribution — identify bottleneck agents in the chain
  3. Compliance audit — verify that only authorized agents handled sensitive data

17.5 Consensus Mechanisms: Majority Voting, Weighted Voting, Debate, and Arbitration#

17.5.1 When Consensus Is Required#

Consensus mechanisms activate when:

  • Multiple agents produce conflicting outputs for the same task
  • A critical decision requires collective judgment (e.g., plan selection, risk assessment)
  • Verification results are ambiguous or contradictory
  • The task specification admits multiple valid solutions and one must be committed

17.5.2 Majority Voting#

Given nn agents producing candidate outputs {o1,o2,,on}\{o_1, o_2, \dots, o_n\} for a task, majority voting selects the output with the most support:

o=argmaxoOi=1n1[equiv(oi,o)]o^* = \arg\max_{o \in \mathcal{O}} \sum_{i=1}^{n} \mathbb{1}[\text{equiv}(o_i, o)]

where equiv(,)\text{equiv}(\cdot, \cdot) is a semantic equivalence function (exact match for structured outputs, embedding-space clustering for natural language).

Properties:

  • Requires n3n \geq 3 participants (odd nn preferred to avoid ties)
  • Correct when the majority of agents are individually correct: P(consensus correct)>P(individual correct)P(\text{consensus correct}) > P(\text{individual correct}) when P(individual correct)>0.5P(\text{individual correct}) > 0.5 (Condorcet Jury Theorem)
  • Inefficient when agents share failure modes (correlated errors from same model family)

Condorcet amplification — for nn independent agents each with accuracy p>0.5p > 0.5:

P(majority correct)=k=n/2n(nk)pk(1p)nkP(\text{majority correct}) = \sum_{k=\lceil n/2 \rceil}^{n} \binom{n}{k} p^k (1-p)^{n-k}

This converges to 1 as nn \to \infty, but the independence assumption rarely holds in practice when agents share model weights.

17.5.3 Weighted Voting#

Assigns differential credibility to agents based on expertise, past performance, or role authority:

o=argmaxoOi=1nwi1[equiv(oi,o)]o^* = \arg\max_{o \in \mathcal{O}} \sum_{i=1}^{n} w_i \cdot \mathbb{1}[\text{equiv}(o_i, o)]

where weights wiw_i satisfy iwi=1,  wi0\sum_{i} w_i = 1, \; w_i \geq 0.

Weight computation:

wi=reliability(ai,task_domain)recency_factor(ai)j=1nreliability(aj,task_domain)recency_factor(aj)w_i = \frac{\text{reliability}(a_i, \text{task\_domain}) \cdot \text{recency\_factor}(a_i)}{\sum_{j=1}^{n} \text{reliability}(a_j, \text{task\_domain}) \cdot \text{recency\_factor}(a_j)}

where reliability(ai,d)\text{reliability}(a_i, d) is the historical accuracy of agent aia_i on domain dd, and recency_factor\text{recency\_factor} decays older performance observations.

17.5.4 Structured Debate#

When outputs are complex or the decision requires justification, debate replaces simple voting:

Pseudo-Algorithm 17.4 — Multi-Agent Structured Debate

PROCEDURE StructuredDebate(proposals, agents, moderator, max_rounds):
    // proposals: list of (agent, proposed_output, rationale)
    debate_log ← []
    
    FOR round ← 1 TO max_rounds:
        // Phase 1: Challenge
        FOR EACH (agent_i, proposal_i, rationale_i) IN proposals:
            challenges ← []
            FOR EACH (agent_j, proposal_j, _) IN proposals WHERE j ≠ i:
                challenge ← agent_j.Critique(
                    target_proposal = proposal_i,
                    target_rationale = rationale_i,
                    own_proposal = proposal_j,
                    debate_history = debate_log
                )
                challenges.APPEND((agent_j, challenge))
            debate_log.APPEND((round, "challenges", agent_i, challenges))
        
        // Phase 2: Defend / Revise
        updated_proposals ← []
        FOR EACH (agent_i, proposal_i, rationale_i) IN proposals:
            relevant_challenges ← GetChallengesFor(agent_i, debate_log, round)
            response ← agent_i.DefendOrRevise(
                own_proposal = proposal_i,
                challenges = relevant_challenges,
                debate_history = debate_log
            )
            updated_proposals.APPEND((agent_i, response.proposal, response.rationale))
            debate_log.APPEND((round, "defense", agent_i, response))
        
        proposals ← updated_proposals
        
        // Phase 3: Convergence Check
        IF AllProposalsEquivalent(proposals):
            RETURN ConsensusResult(proposals[0], debate_log, "converged")
        
        // Phase 4: Early Termination
        IF moderator.JudgeConvergenceLikelihood(debate_log) < threshold:
            BREAK
    
    // Fallback: Moderator Arbitration
    RETURN moderator.Arbitrate(proposals, debate_log)

17.5.5 Arbitration#

When debate does not converge, a designated arbiter agent (or human escalation target) resolves the dispute:

o=Arbiter.Decide({(oi,rationalei,evidencei)}i=1n,  debate_log,  task_spec)o^* = \text{Arbiter.Decide}\left( \{(o_i, \text{rationale}_i, \text{evidence}_i)\}_{i=1}^{n}, \; \text{debate\_log}, \; \text{task\_spec} \right)

The arbiter has elevated authority ParbiteriPi\mathcal{P}_{\text{arbiter}} \supseteq \bigcup_i \mathcal{P}_i and access to the full debate log. Arbitration decisions are flagged in the accountability ledger with decision_method: ARBITRATION for downstream audit.

17.5.6 Consensus Mechanism Selection Matrix#

CriterionMajority VoteWeighted VoteDebateArbitration
Decision latencyLowLowHighMedium
Justification depthNoneNoneDeepMedium
Accuracy (uncorrelated errors)HighHigherHighestVaries
Accuracy (correlated errors)LowMediumHigherHigh
Token costO(n)O(n)O(n)O(n)O(n2R)O(n^2 \cdot R)O(n)O(n)
Suitable for structured outputsYesYesLess soYes
Suitable for open-ended reasoningNoNoYesYes

where RR is the number of debate rounds.


17.6 Conflict Resolution: Priority Hierarchies, Evidence-Based Arbitration, and Escalation#

17.6.1 Conflict Taxonomy#

Conflicts in agent teams arise from:

Conflict TypeDescriptionExample
Output ConflictAgents produce mutually incompatible outputs for the same taskTwo implementers generate contradictory code patches
Resource ConflictMultiple agents claim the same resource simultaneouslyConcurrent writes to the same artifact or tool endpoint
Priority ConflictAgents disagree on task ordering or urgencyPlanner schedules task AA first; verifier demands task BB first
Authority ConflictRole boundaries overlap or are ambiguousTwo agents both believe they have write authority over a document
Semantic ConflictAgents hold contradictory beliefs about facts or requirementsInconsistent interpretations of an ambiguous specification

17.6.2 Priority Hierarchy#

A total ordering over authority resolves unambiguous conflicts mechanically:

Priority:RNwhererirj    Priority(ri)>Priority(rj)\text{Priority}: \mathcal{R} \to \mathbb{N} \quad \text{where} \quad r_i \succ r_j \iff \text{Priority}(r_i) > \text{Priority}(r_j)

Typical ordering:

HumanOrchestratorVerifierImplementerRetrieverDocumenter\text{Human} \succ \text{Orchestrator} \succ \text{Verifier} \succ \text{Implementer} \succ \text{Retriever} \succ \text{Documenter}

When agents of different roles conflict, the higher-priority role's output prevails by default.

17.6.3 Evidence-Based Arbitration#

For substantive disagreements (semantic or output conflicts), resolution is not authority-based but evidence-based:

Pseudo-Algorithm 17.5 — Evidence-Based Conflict Resolution

PROCEDURE ResolveConflict(conflicting_outputs, agents, evidence_store, escalation_policy):
    // Step 1: Evidence Collection
    evidence_sets ← {}
    FOR EACH (agent_i, output_i) IN conflicting_outputs:
        evidence_i ← agent_i.ProduceEvidence(output_i, evidence_store)
        // Evidence includes: source documents, test results, retrieved facts,
        // formal proofs, historical precedents
        evidence_sets[agent_i] ← evidence_i
    
    // Step 2: Evidence Quality Scoring
    FOR EACH (agent_i, evidence_i) IN evidence_sets:
        evidence_i.score ← EvidenceQuality(evidence_i)
        // Quality = f(provenance_strength, recency, source_authority,
        //             internal_consistency, corroboration_count)
    
    // Step 3: Automated Resolution Attempt
    best_supported ← argmax over (agent_i, output_i) of evidence_sets[agent_i].score
    confidence ← evidence_sets[best_supported.agent].score / 
                  SUM(all evidence scores)
    
    IF confidence ≥ escalation_policy.auto_resolve_threshold:
        RETURN Resolution(
            selected = best_supported.output,
            method = "evidence_based_auto",
            confidence = confidence,
            evidence_summary = evidence_sets
        )
    
    // Step 4: Escalation
    IF escalation_policy.allow_human_escalation:
        RETURN EscalateToHuman(conflicting_outputs, evidence_sets)
    ELSE:
        RETURN Resolution(
            selected = best_supported.output,
            method = "evidence_based_low_confidence",
            confidence = confidence,
            flag = "requires_review"
        )

17.6.4 Evidence Quality Function#

EvidenceQuality(E)=eEαprovProvenance(e)+αfreshFreshness(e)+αauthAuthority(e)+αcorrCorroboration(e)\text{EvidenceQuality}(E) = \sum_{e \in E} \alpha_{\text{prov}} \cdot \text{Provenance}(e) + \alpha_{\text{fresh}} \cdot \text{Freshness}(e) + \alpha_{\text{auth}} \cdot \text{Authority}(e) + \alpha_{\text{corr}} \cdot \text{Corroboration}(e)

where:

  • Provenance(e)[0,1]\text{Provenance}(e) \in [0,1] — traceability to a verified source
  • Freshness(e)=exp(λ(tnowtsource))\text{Freshness}(e) = \exp(-\lambda \cdot (t_{\text{now}} - t_{\text{source}})) — exponential decay by source age
  • Authority(e)[0,1]\text{Authority}(e) \in [0,1] — source tier ranking (official documentation > community wiki > generated text)
  • Corroboration(e)\text{Corroboration}(e) — number of independent sources confirming the same claim
  • α\alpha weights are domain-configurable

17.6.5 Escalation Ladder#

Escalation proceeds through a defined chain when automated resolution fails:

Auto-Resolutionlow confidenceSenior Agent Arbitrationstill unresolvedHuman Expert Reviewpolicy ambiguityPolicy Update\text{Auto-Resolution} \xrightarrow{\text{low confidence}} \text{Senior Agent Arbitration} \xrightarrow{\text{still unresolved}} \text{Human Expert Review} \xrightarrow{\text{policy ambiguity}} \text{Policy Update}

Each escalation level has:

  • SLA: maximum response time
  • Cost ceiling: budget allocated for the escalation step
  • Outcome: binding decision plus ledger entry recording the resolution path

17.7 Team Memory: Shared Session State, Collective Episodic Memory, and Team Knowledge Base#

17.7.1 Memory Architecture Overview#

Team memory is stratified into four tiers with distinct lifecycle, access patterns, and write policies:

Mteam=Mworking,  Msession,  Mepisodic,  Mknowledge\mathcal{M}_{\text{team}} = \langle \mathcal{M}_{\text{working}}, \; \mathcal{M}_{\text{session}}, \; \mathcal{M}_{\text{episodic}}, \; \mathcal{M}_{\text{knowledge}} \rangle
TierScopeLifetimeWrite PolicyAccess Pattern
Working Memory Mworking\mathcal{M}_{\text{working}}Per-agent, per-stepSingle execution stepAgent-local, no coordinationPrivate read/write
Session Memory Msession\mathcal{M}_{\text{session}}Per-team, per-taskTask durationOptimistic concurrency via state channelShared read/write
Episodic Memory Mepisodic\mathcal{M}_{\text{episodic}}Per-team, cross-taskConfigurable TTL (hours to months)Validated write-back after task completionShared read; gated write
Knowledge Base Mknowledge\mathcal{M}_{\text{knowledge}}Organization-widePermanent (until explicit revocation)Human-approved promotion from episodic tierRead-only for agents; write via promotion pipeline

17.7.2 Shared Session State#

The shared session state Msession\mathcal{M}_{\text{session}} is the team's real-time coordination substrate:

Msession={(k,v,vversion,twrite,awriter)  |  kKeySpace}\mathcal{M}_{\text{session}} = \left\{ (k, v, v_{\text{version}}, t_{\text{write}}, a_{\text{writer}}) \;\middle|\; k \in \text{KeySpace} \right\}

Operations:

  • READ(key) → (value, version) — returns latest committed value
  • WRITE(key, value, expected_version) → Success(new_version) | ConflictError — compare-and-swap
  • SUBSCRIBE(key_pattern, callback) — event-driven notification on matching writes
  • SCAN(prefix, limit) → [(key, value, version)] — bounded enumeration

Namespacing prevents collision:

session/{team_id}/task_state/{subtask_id}
session/{team_id}/artifacts/{artifact_id}
session/{team_id}/roster/{agent_id}/status
session/{team_id}/consensus/{decision_id}

17.7.3 Collective Episodic Memory#

After task completion, the team's session state is processed into validated episodic memories:

Pseudo-Algorithm 17.6 — Episodic Memory Write-Back

PROCEDURE WriteBackEpisodicMemory(session_state, task_result, team, policy):
    // Step 1: Extract candidate memories
    candidates ← []
    
    // Non-obvious corrections (e.g., "approach X failed; approach Y succeeded")
    corrections ← ExtractCorrections(session_state.accountability_ledger)
    candidates.EXTEND(corrections)
    
    // Effective strategies (e.g., "for task type T, agent role R was critical")
    strategies ← ExtractSuccessfulStrategies(session_state, task_result)
    candidates.EXTEND(strategies)
    
    // Discovered constraints (e.g., "API X has undocumented rate limit of 100/min")
    constraints ← ExtractDiscoveredConstraints(session_state)
    candidates.EXTEND(constraints)
    
    // Step 2: Filter for novelty and non-obviousness
    FOR EACH candidate IN candidates:
        IF IsDuplicate(candidate, existing_episodic_memory):
            SKIP
        IF IsObvious(candidate, knowledge_base):
            SKIP  // Don't store what's already in canonical knowledge
        candidate.utility_score ← EstimateUtility(candidate, policy.utility_model)
    
    // Step 3: Validate and store
    validated ← FILTER candidates WHERE utility_score ≥ policy.min_utility
    FOR EACH memory IN validated:
        memory.provenance ← BuildProvenance(memory, session_state, team)
        memory.expiry ← ComputeExpiry(memory, policy.ttl_model)
        memory.embedding ← Embed(memory.content)
        EpisodicStore.Write(memory)
    
    RETURN validated

17.7.4 Team Knowledge Base#

The knowledge base Mknowledge\mathcal{M}_{\text{knowledge}} contains organization-level facts, policies, and validated procedures that transcend individual teams:

KnowledgeItem=content,  type{fact,policy,procedure,constraint},  domain,  authority,  version,  provenance\text{KnowledgeItem} = \langle \text{content}, \; \text{type} \in \{\text{fact}, \text{policy}, \text{procedure}, \text{constraint}\}, \; \text{domain}, \; \text{authority}, \; \text{version}, \; \text{provenance} \rangle

Promotion pipeline (episodic → knowledge):

  1. Frequency threshold — episodic memories referenced k\geq k times across distinct teams
  2. Validation — confirmed by human reviewer or automated test suite
  3. Deduplication — merged with existing knowledge items if overlapping
  4. Versioning — supersedes prior versions with explicit deprecation markers

17.7.5 Memory Garbage Collection#

Stale memories degrade retrieval precision. A periodic cleanup agent enforces:

Retain(m)    (tnowtlast_access(m)<TTL(m))(utility(m)>τmin)¬superseded(m)\text{Retain}(m) \iff \left( t_{\text{now}} - t_{\text{last\_access}}(m) < \text{TTL}(m) \right) \wedge \left( \text{utility}(m) > \tau_{\text{min}} \right) \wedge \neg \text{superseded}(m)

Expired items are soft-deleted (moved to archive), not hard-deleted, to preserve audit trails.


17.8 Load Balancing Across Team Members: Work Distribution, Capacity Monitoring, and Rebalancing#

17.8.1 Load Model#

Each agent aia_i has a capacity model:

Capacity(ai)=(Cithroughput,  Ciconcurrent,  Citoken_budget,  Cilatency_ceiling)\text{Capacity}(a_i) = \left( C_i^{\text{throughput}}, \; C_i^{\text{concurrent}}, \; C_i^{\text{token\_budget}}, \; C_i^{\text{latency\_ceiling}} \right)

and a current load vector:

Load(ai,t)=(Liactive_tasks,  Litokens_consumed,  Liqueue_depth,  Liavg_latency)\text{Load}(a_i, t) = \left( L_i^{\text{active\_tasks}}, \; L_i^{\text{tokens\_consumed}}, \; L_i^{\text{queue\_depth}}, \; L_i^{\text{avg\_latency}} \right)

The load ratio is:

ρi(t)=Load(ai,t)Capacity(ai)[0,1]\rho_i(t) = \frac{\|\text{Load}(a_i, t)\|}{\|\text{Capacity}(a_i)\|} \in [0, 1]

where the norm is a weighted combination reflecting the most constrained dimension:

ρi(t)=max(Liactive_tasksCiconcurrent,  Litokens_consumedCitoken_budget,  Liqueue_depthQmax)\rho_i(t) = \max\left( \frac{L_i^{\text{active\_tasks}}}{C_i^{\text{concurrent}}}, \; \frac{L_i^{\text{tokens\_consumed}}}{C_i^{\text{token\_budget}}}, \; \frac{L_i^{\text{queue\_depth}}}{Q_{\max}} \right)

17.8.2 Work Distribution Strategies#

Strategy 1: Round-Robin with Capability Filtering

Simple, fair, but ignores heterogeneous agent strengths:

next_agent=eligible[  (countermodeligible)  ]\text{next\_agent} = \text{eligible}[\; (\text{counter} \mod |\text{eligible}|) \;]

where eligible={aiai.capabilitiestask.requirementsρi<ρmax}\text{eligible} = \{ a_i \mid a_i.\text{capabilities} \supseteq \text{task.requirements} \wedge \rho_i < \rho_{\max} \}.

Strategy 2: Least-Loaded Assignment

a=argminaieligibleρi(t)a^* = \arg\min_{a_i \in \text{eligible}} \rho_i(t)

Balances load but may under-utilize specialized agents by routing work to generalists.

Strategy 3: Capability-Weighted Least-Loaded

a=argminaieligibleρi(t)Match(ai,task)a^* = \arg\min_{a_i \in \text{eligible}} \frac{\rho_i(t)}{\text{Match}(a_i, \text{task})}

This favors agents with both low load and high task affinity.

Strategy 4: Predictive Assignment

Uses estimated task completion time T^(ai,task)\hat{T}(a_i, \text{task}) to minimize makespan:

a=argminaieligible(queue_wait(ai)+T^(ai,task))a^* = \arg\min_{a_i \in \text{eligible}} \left( \text{queue\_wait}(a_i) + \hat{T}(a_i, \text{task}) \right)

17.8.3 Capacity Monitoring#

Pseudo-Algorithm 17.7 — Capacity Monitor

PROCEDURE MonitorCapacity(team, interval, alert_thresholds):
    LOOP EVERY interval:
        FOR EACH agent IN team:
            load ← MeasureCurrentLoad(agent)
            capacity ← GetCapacity(agent)
            rho ← ComputeLoadRatio(load, capacity)
            
            PublishMetric("agent_load_ratio", agent.id, rho)
            
            IF rho ≥ alert_thresholds.overload:           // e.g., 0.9
                EmitAlert(OVERLOAD, agent, rho)
                TriggerRebalance(team, agent)
            ELSE IF rho ≥ alert_thresholds.high:           // e.g., 0.75
                EmitAlert(HIGH_LOAD, agent, rho)
            ELSE IF rho ≤ alert_thresholds.idle:           // e.g., 0.1
                EmitAlert(UNDERUTILIZED, agent, rho)
        
        // Team-level metrics
        rho_mean ← MEAN(all rho values)
        rho_stddev ← STDDEV(all rho values)
        imbalance ← rho_stddev / rho_mean    // Coefficient of variation
        
        PublishMetric("team_load_imbalance", team.id, imbalance)
        
        IF imbalance > alert_thresholds.max_imbalance:
            TriggerRebalance(team)

17.8.4 Rebalancing Protocol#

When load imbalance exceeds thresholds, the orchestrator redistributes work:

Pseudo-Algorithm 17.8 — Work Rebalancing

PROCEDURE Rebalance(team, overloaded_agents, orchestrator):
    // Step 1: Identify movable tasks
    movable_tasks ← []
    FOR EACH agent IN overloaded_agents:
        FOR EACH task IN agent.active_queue:
            IF task.status = QUEUED AND NOT task.pinned:
                movable_tasks.APPEND((agent, task))
    
    // Step 2: Find target agents
    FOR EACH (source, task) IN movable_tasks:
        targets ← FILTER team WHERE:
            agent.id ≠ source.id AND
            agent.capabilities ⊇ task.requirements AND
            LoadAfterAssignment(agent, task) < rho_max
        
        IF targets NOT EMPTY:
            best_target ← argmin over targets of LoadAfterAssignment(target, task)
            ExecuteHandoff(source, best_target, task)
            LogRebalance(source, best_target, task)
    
    // Step 3: Scale if rebalancing insufficient
    IF StillOverloaded(team):
        IF scaling_policy.allow_auto_scale:
            new_agent ← ProvisionAgent(required_capabilities, scaling_policy)
            team.ADD(new_agent)
            Rebalance(team, overloaded_agents, orchestrator)  // Recurse once

17.8.5 Backpressure#

When all agents are saturated and scaling is exhausted, the system applies backpressure:

  1. Queue depth limits — reject new tasks beyond queue capacity with explicit error codes
  2. Priority-based shedding — drop lowest-priority tasks, notifying callers
  3. Deadline-aware deferral — tasks with distant deadlines are deferred; urgent tasks are prioritized
  4. Client-facing latency signals — propagate expected wait times to upstream callers
Effective throughput=min(arrival_rate,  i=1nCithroughputTˉi)\text{Effective throughput} = \min\left( \text{arrival\_rate}, \; \sum_{i=1}^{n} \frac{C_i^{\text{throughput}}}{\bar{T}_i} \right)

17.9 Team Performance Metrics: Throughput, Quality, Coordination Overhead, and Team Efficiency#

17.9.1 Metric Taxonomy#

Team performance must be measured at multiple granularities to enable diagnosis:

LevelMetrics
Agent-levelTask accuracy, mean latency, token efficiency, error rate, handoff success rate
Team-levelEnd-to-end task throughput, collective quality score, coordination overhead, makespan
System-levelCost per task, SLO compliance, human escalation rate, knowledge base growth rate

17.9.2 Core Metric Definitions#

Throughput:

Θteam=tasks_completed(Δt)Δt\Theta_{\text{team}} = \frac{|\text{tasks\_completed}(\Delta t)|}{\Delta t}

Quality:

Qteam=1taskstaskQualityScore(task.output,task.ground_truth_or_rubric)Q_{\text{team}} = \frac{1}{|\text{tasks}|} \sum_{\text{task}} \text{QualityScore}(\text{task.output}, \text{task.ground\_truth\_or\_rubric})

where QualityScore\text{QualityScore} is domain-specific: exact match for structured outputs, rubric-based evaluation for generative tasks, test-pass rate for code.

Coordination Overhead:

Ω=TcoordinationTtotal=Thandoffs+Tconsensus+Tsync+Tconflict_resolutionTproductive+Tcoordination\Omega = \frac{T_{\text{coordination}}}{T_{\text{total}}} = \frac{T_{\text{handoffs}} + T_{\text{consensus}} + T_{\text{sync}} + T_{\text{conflict\_resolution}}}{T_{\text{productive}} + T_{\text{coordination}}}

A well-functioning team targets Ω<0.2\Omega < 0.2 (less than 20% of total time on coordination).

Team Efficiency:

ηteam=Output Value(team)i=1nOutput Value(agenti solo)\eta_{\text{team}} = \frac{\text{Output Value}(\text{team})}{\sum_{i=1}^{n} \text{Output Value}(\text{agent}_i \text{ solo})}

ηteam>1\eta_{\text{team}} > 1 indicates super-additive collaboration; ηteam<1\eta_{\text{team}} < 1 indicates coordination costs exceed collaboration benefits.

Makespan vs. Sum-of-Parts:

Speedup=TsequentialTparallel_team\text{Speedup} = \frac{T_{\text{sequential}}}{T_{\text{parallel\_team}}}

Theoretical maximum is nn (linear speedup); practical values are bounded by Amdahl's Law:

Speedup1(1f)+f/n\text{Speedup} \leq \frac{1}{(1 - f) + f/n}

where ff is the fraction of work that is parallelizable.

17.9.3 Coordination Cost Model#

Total cost of a team task:

Costtotal=i=1nciTicomputecompute cost+i=1ntokensipitokentoken cost+ccoordmessagescoordination cost+chumanThumanhuman escalation cost\text{Cost}_{\text{total}} = \underbrace{\sum_{i=1}^{n} c_i \cdot T_i^{\text{compute}}}_{\text{compute cost}} + \underbrace{\sum_{i=1}^{n} \text{tokens}_i \cdot p_i^{\text{token}}}_{\text{token cost}} + \underbrace{c_{\text{coord}} \cdot |\text{messages}|}_{\text{coordination cost}} + \underbrace{c_{\text{human}} \cdot T_{\text{human}}}_{\text{human escalation cost}}

Optimization objective:

minteam configurationCosttotals.t.QteamQmin,  TmakespanTmax,  ΩΩmax\min_{\text{team configuration}} \text{Cost}_{\text{total}} \quad \text{s.t.} \quad Q_{\text{team}} \geq Q_{\min}, \; T_{\text{makespan}} \leq T_{\max}, \; \Omega \leq \Omega_{\max}

17.9.4 Performance Dashboard Schema#

A production team performance dashboard exposes:

TeamPerformanceDashboard:
    real_time:
        active_tasks_count: gauge
        agent_load_ratios: histogram
        queue_depths: per_agent gauge
        handoff_success_rate: rolling_window counter
    
    periodic (per task completion):
        task_latency: histogram (p50, p90, p99)
        task_quality_score: histogram
        tokens_consumed: counter
        consensus_rounds_required: histogram
        conflict_resolution_count: counter
    
    aggregate (hourly/daily):
        throughput: rate
        coordination_overhead_ratio: gauge
        team_efficiency: gauge
        cost_per_task: gauge
        SLO_compliance_rate: percentage
        human_escalation_rate: percentage

17.9.5 Diagnostic Analysis: Coordination Anti-Patterns#

Anti-PatternSymptomMetric SignalRemediation
Bottleneck AgentOne agent's queue grows while others are idleHigh load variance, ρmax/ρmin>3\rho_{\max} / \rho_{\min} > 3Rebalance, add specialists, decompose tasks
Consensus ThrashingDebate rounds consistently hit max_roundsHigh TconsensusT_{\text{consensus}}, low convergence rateTighten task specs, increase verifier authority
Handoff Ping-PongTasks bounce between agents repeatedlyHandoff count per task > 3Clarify role boundaries, improve context summaries
Redundant WorkMultiple agents unknowingly solve the same sub-taskToken cost anomaly, duplicate artifact detectionImprove task locking, shared state visibility
Escalation CascadeMost conflicts escalate to human reviewEscalation rate > 15%Improve evidence quality, lower auto-resolve threshold

17.10 Adaptive Team Composition: Runtime Role Reassignment Based on Task Evolution#

17.10.1 Motivation#

Tasks evolve during execution. A planning-heavy initial phase may give way to implementation-intensive work, then shift to verification-dominant finalization. Static role assignments waste capacity by keeping planning agents idle during implementation and vice versa.

17.10.2 Task Phase Model#

Model the task lifecycle as a sequence of phases with distinct capability demands:

TaskLifecycle=[ϕ1,ϕ2,,ϕm]\text{TaskLifecycle} = [\phi_1, \phi_2, \dots, \phi_m]

Each phase ϕj\phi_j has a capability demand profile:

djRR\mathbf{d}_j \in \mathbb{R}^{|\mathcal{R}|}

where dj,kd_{j,k} represents the intensity of demand for role rkr_k during phase ϕj\phi_j.

17.10.3 Phase Detection#

The orchestrator continuously monitors task state to detect phase transitions:

Pseudo-Algorithm 17.9 — Phase Detection and Role Reassignment

PROCEDURE AdaptiveComposition(team, task_state, phase_model, reassignment_policy):
    current_phase ← DetectCurrentPhase(task_state, phase_model)
    // Detection uses: subtask completion ratios, artifact types being produced,
    // queue composition, time elapsed vs. estimated timeline
    
    IF current_phase ≠ last_detected_phase:
        // Phase transition detected
        demand_profile ← phase_model.GetDemand(current_phase)
        current_allocation ← GetCurrentRoleAllocation(team)
        
        // Compute reallocation
        deficit ← {}
        surplus ← {}
        FOR EACH role IN role_taxonomy:
            delta ← demand_profile[role] - current_allocation[role]
            IF delta > 0:
                deficit[role] ← delta
            ELSE IF delta < 0:
                surplus[role] ← |delta|
        
        // Reassign surplus agents to deficit roles
        FOR EACH (surplus_role, count) IN surplus:
            reassignable ← GetAgentsWithRole(team, surplus_role)
            reassignable ← FILTER reassignable WHERE:
                agent.capabilities SUPPORTS any deficit_role AND
                agent.active_tasks = 0 OR agent.active_tasks are pausable
            
            FOR EACH agent IN reassignable[0:count]:
                target_role ← SelectBestDeficitRole(agent, deficit)
                IF target_role AND reassignment_policy.AllowReassignment(agent, target_role):
                    ExecuteRoleReassignment(agent, surplus_role, target_role)
                    UpdateSMM(team, agent, target_role)
                    deficit[target_role] ← deficit[target_role] - 1
        
        // Scale if deficit persists
        FOR EACH (role, remaining) IN deficit WHERE remaining > 0:
            IF reassignment_policy.allow_scaling:
                FOR i ← 1 TO remaining:
                    new_agent ← ProvisionAgent(role)
                    team.ADD(new_agent)
        
        last_detected_phase ← current_phase

17.10.4 Role Reassignment Cost Function#

Reassignment is not free. The cost includes context reconstruction, warm-up latency, and potential errors during transition:

ReassignmentCost(ai,rold,rnew)=Ccontext_rebuild(ai,rnew)+Cwarmup(ai,rnew)+E[transition_errors(ai)]\text{ReassignmentCost}(a_i, r_{\text{old}}, r_{\text{new}}) = C_{\text{context\_rebuild}}(a_i, r_{\text{new}}) + C_{\text{warmup}}(a_i, r_{\text{new}}) + \mathbb{E}[\text{transition\_errors}(a_i)]

Reassignment is justified only when:

Benefit(reassignment)=ΔΘteamVthroughput+ΔQteamVquality>ReassignmentCost\text{Benefit}(\text{reassignment}) = \Delta \Theta_{\text{team}} \cdot V_{\text{throughput}} + \Delta Q_{\text{team}} \cdot V_{\text{quality}} > \text{ReassignmentCost}

where VthroughputV_{\text{throughput}} and VqualityV_{\text{quality}} are value multipliers converting metric improvements to cost equivalents.

17.10.5 Agent Polymorphism#

Some agents are polymorphic: they can assume multiple roles with minimal context switching cost (e.g., a large frontier model with broad capabilities). Others are specialized: optimized for one role with high performance but unable to switch (e.g., a fine-tuned code generation model).

The team composition optimizer balances:

TeamComposition=argmax(Specialization BenefitFlexibility Cost)\text{TeamComposition}^* = \arg\max \left( \text{Specialization Benefit} - \text{Flexibility Cost} \right)

A practical heuristic: maintain a core of specialized agents for steady-state roles and a pool of polymorphic agents for adaptive reallocation.


17.11 Human-Agent Team Integration: Blended Teams with Human Experts and AI Agents#

17.11.1 Blended Team Model#

A blended team Tblend\mathcal{T}_{\text{blend}} extends the agent team model to include human participants:

Tblend=AhumanAagent,  R,  Φ,  G,  Γblend,  L\mathcal{T}_{\text{blend}} = \langle \mathcal{A}_{\text{human}} \cup \mathcal{A}_{\text{agent}}, \; \mathcal{R}, \; \Phi, \; \mathcal{G}, \; \Gamma_{\text{blend}}, \; \mathcal{L} \rangle

Human participants differ from agent participants in:

DimensionHumanAgent
LatencyMinutes to hoursMilliseconds to seconds
AvailabilityScheduled, asynchronousAlways-on, synchronous
JudgmentNuanced, contextual, value-ladenConsistent, scalable, policy-bound
AuthorityUltimate decision authorityDelegated authority within policy bounds
Error profileFatigue, attention, biasHallucination, specification misinterpretation
CommunicationNatural language, high bandwidthStructured protocols, token-bounded

17.11.2 Human-Agent Interaction Protocols#

Protocol 1: Human-in-the-Loop (HITL) — Approval Gate

Agents produce candidate outputs; humans approve, modify, or reject before commitment:

Output Pipeline:  AgentcandidateHuman Reviewapproved/modifiedCommit\text{Output Pipeline}: \; \text{Agent} \xrightarrow{\text{candidate}} \text{Human Review} \xrightarrow{\text{approved/modified}} \text{Commit}

Use when: mutation risk is high, regulatory requirements mandate human oversight, or agent confidence is below threshold.

Protocol 2: Human-on-the-Loop (HOTL) — Supervisory Monitoring

Agents operate autonomously with human monitoring. Human intervenes only on alerts:

Normal:  Agentauto-commitOutput\text{Normal}: \; \text{Agent} \xrightarrow{\text{auto-commit}} \text{Output} Alert:  AgentflagHumanoverride/confirmOutput\text{Alert}: \; \text{Agent} \xrightarrow{\text{flag}} \text{Human} \xrightarrow{\text{override/confirm}} \text{Output}

Use when: task volume is high, agent reliability is established, and cost of occasional errors is bounded.

Protocol 3: Human-as-Tool

Agents invoke human expertise as a structured tool call:

AgentHumanQuery(question,context,deadline)HumanHumanResponse(answer,confidence)Agent\text{Agent} \xrightarrow{\text{HumanQuery}(question, context, deadline)} \text{Human} \xrightarrow{\text{HumanResponse}(answer, confidence)} \text{Agent}

The tool interface exposes:

  • schema: typed question format with required context fields
  • timeout: maximum wait time before fallback
  • fallback: default action if human does not respond within deadline

17.11.3 Asymmetric Communication Design#

Human attention is the scarcest resource. Communication from agents to humans must be:

  1. Summarized — compress full context into decision-ready briefings
  2. Actionable — present clear options with trade-off analysis, not raw data
  3. Prioritized — sort by urgency and impact; batch low-priority items
  4. Minimal — eliminate unnecessary interruptions; escalate only when policy requires
Human Interruption Budget:  Imax=f(task criticality,  human availability,  agent confidence)\text{Human Interruption Budget}: \; I_{\text{max}} = f(\text{task criticality}, \; \text{human availability}, \; \text{agent confidence})

Pseudo-Algorithm 17.10 — Human Escalation Manager

PROCEDURE ManageHumanEscalation(escalation_request, human_state, policy):
    // Step 1: Assess necessity
    urgency ← AssessUrgency(escalation_request)
    impact ← AssessImpact(escalation_request)
    agent_confidence ← escalation_request.confidence
    
    // Step 2: Check if agent can self-resolve with lower threshold
    IF agent_confidence > policy.soft_threshold AND impact < policy.impact_ceiling:
        RETURN AutoResolve(escalation_request, flag_for_async_review=TRUE)
    
    // Step 3: Batch non-urgent escalations
    IF urgency < policy.urgency_threshold:
        AddToBatch(escalation_request, human_state.pending_batch)
        IF human_state.pending_batch.size ≥ policy.batch_size OR
           human_state.pending_batch.age ≥ policy.max_batch_age:
            FormatBatchBriefing(human_state.pending_batch)
            NotifyHuman(human_state, "batch_review_ready")
        RETURN DEFERRED
    
    // Step 4: Urgent escalation
    briefing ← FormatUrgentBriefing(
        question = escalation_request.question,
        context_summary = CompressContext(escalation_request.context, policy.briefing_budget),
        options = escalation_request.options,
        recommendation = escalation_request.agent_recommendation,
        evidence = escalation_request.evidence_summary,
        deadline = escalation_request.deadline
    )
    NotifyHuman(human_state, briefing, priority=HIGH)
    
    // Step 5: Wait with timeout
    response ← WaitForHumanResponse(timeout=escalation_request.deadline)
    IF response = TIMEOUT:
        RETURN FallbackAction(escalation_request, policy.timeout_fallback)
    
    // Step 6: Record and apply
    RecordHumanDecision(response, escalation_request, accountability_ledger)
    RETURN response

17.11.4 Trust Calibration#

The team's delegation policy must adapt based on observed agent reliability:

Delegation Level(ai,task_class)=f(historical_accuracy(ai,task_class),  task_risk,  organizational_policy)\text{Delegation Level}(a_i, \text{task\_class}) = f\left( \text{historical\_accuracy}(a_i, \text{task\_class}), \; \text{task\_risk}, \; \text{organizational\_policy} \right)

A practical trust model uses a Beta distribution:

Trust(ai)Beta(αi,βi)\text{Trust}(a_i) \sim \text{Beta}(\alpha_i, \beta_i)

where αi\alpha_i counts successful autonomous completions and βi\beta_i counts failures requiring human correction. The delegation threshold is:

Delegate autonomously if  αiαi+βiθtrust  and  αi+βiNmin\text{Delegate autonomously if} \; \frac{\alpha_i}{\alpha_i + \beta_i} \geq \theta_{\text{trust}} \; \text{and} \; \alpha_i + \beta_i \geq N_{\min}

This ensures both high estimated reliability and sufficient evidence (sample size).


17.12 Inspiration from High-Reliability Organizations (HROs): Crew Resource Management for Agent Teams#

17.12.1 HRO Principles Applied to Agent Teams#

High-Reliability Organizations (HROs)—nuclear power plants, aircraft carrier operations, air traffic control, surgical teams—achieve extremely low failure rates in high-complexity, high-consequence environments. Five core HRO principles map directly to agent team design:

HRO PrincipleDefinitionAgent Team Application
Preoccupation with FailureTreat near-misses as full failures; actively seek failure signalsMonitor all agent outputs for hallucination markers, even when outputs appear correct; log near-misses (low-confidence correct answers)
Reluctance to SimplifyResist reductive interpretations; maintain nuanced situation awarenessRequire agents to preserve uncertainty and alternative interpretations in handoff context; forbid premature commitment
Sensitivity to OperationsMaintain real-time situational awareness of frontline workOrchestrator continuously monitors agent-level execution, not just aggregate metrics; agents report anomalies unprompted
Commitment to ResilienceDesign for graceful degradation, not just failure preventionBounded retry, compensating actions, fallback models, degraded-but-functional operation modes
Deference to ExpertiseDecision authority flows to the most knowledgeable agent, not the highest-rankedOverride priority hierarchies when a specialist agent has domain-specific evidence that contradicts a generalist orchestrator

17.12.2 Crew Resource Management (CRM) for Agent Teams#

CRM, originally developed for aviation cockpit teams, provides formalized protocols for:

Briefing and Debriefing:

  • Pre-Task Briefing — Before execution, the orchestrator distributes the shared mental model, confirms role understanding, identifies known risks, and establishes communication protocols
  • Post-Task Debrief — After completion, the team reviews outcomes, identifies coordination failures, extracts lessons, and writes them to episodic memory

Pseudo-Algorithm 17.11 — CRM-Inspired Pre-Task Briefing

PROCEDURE PreTaskBriefing(team, task, orchestrator):
    // Step 1: Situation Assessment
    situation ← orchestrator.AssessSituation(task, environment_state)
    risks ← orchestrator.IdentifyRisks(task, team.capabilities)
    
    // Step 2: Plan Communication
    plan ← orchestrator.CreatePlan(task)
    FOR EACH agent IN team:
        agent_brief ← {
            role: agent.assigned_role,
            objectives: ExtractRoleObjectives(plan, agent.assigned_role),
            risks: FilterRoleRelevantRisks(risks, agent.assigned_role),
            communication_protocols: {
                report_to: GetSupervisor(agent, team),
                escalation_trigger: GetEscalationCriteria(agent.assigned_role),
                status_interval: GetStatusReportInterval(task.urgency)
            },
            authority_boundaries: GetAuthorityBounds(agent.assigned_role),
            challenge_protocol: "If you observe information contradicting the plan, "
                              + "you are OBLIGATED to voice concern to orchestrator "
                              + "with evidence before proceeding."
        }
        DeliverBriefing(agent, agent_brief)
    
    // Step 3: Confirmation
    FOR EACH agent IN team:
        confirmation ← agent.ConfirmBriefing(agent_brief)
        IF NOT confirmation.understood:
            ClarifyAndRebriefing(agent, confirmation.questions)
    
    // Step 4: Establish Monitoring
    SetupMonitoringChannels(team, task)
    
    RETURN BriefingRecord(situation, plan, risks, confirmations)

17.12.3 Challenge-and-Response Protocol#

A critical CRM mechanism is the challenge protocol: any team member who observes an anomaly is obligated to raise it, regardless of role hierarchy. In agent teams:

Challenge(aiaj)=observation,  evidence,  concern,  recommended_action\text{Challenge}(a_i \to a_j) = \langle \text{observation}, \; \text{evidence}, \; \text{concern}, \; \text{recommended\_action} \rangle

The challenged agent must respond with one of:

  1. Acknowledge and Correct — accept the challenge, modify output
  2. Acknowledge and Justify — explain why the observation does not invalidate the output
  3. Escalate — neither agent can resolve; escalate to orchestrator or human

Challenges are logged in the accountability ledger. An agent that ignores a challenge without justification triggers an automatic escalation.

17.12.4 Assertive Communication Hierarchy#

Adapted from aviation's assertiveness ladder:

LevelActionAgent Equivalent
1. HintSubtle suggestionAppend low-confidence note to output
2. PreferenceExpress opinionInclude alternative approach in rationale
3. QueryDirect questionFormal challenge message to responsible agent
4. StatementDeclarative concernFlag output as potentially incorrect in shared state
5. CommandDirect override (authority required)Orchestrator vetoes output; triggers re-execution

Agent teams should be configured to operate at level 3–4 by default: agents should actively challenge rather than passively suggest. This is implemented by including challenge obligations in the role contract's Oi\mathcal{O}_i field.

17.12.5 Structured Debriefing Protocol#

Pseudo-Algorithm 17.12 — Post-Task Structured Debrief

PROCEDURE PostTaskDebrief(team, task_result, execution_trace, orchestrator):
    // Step 1: Outcome Assessment
    success ← EvaluateOutcome(task_result, task.success_criteria)
    quality_score ← ComputeQualityScore(task_result)
    
    // Step 2: Timeline Reconstruction
    timeline ← ReconstructTimeline(execution_trace)
    critical_path ← IdentifyCriticalPath(timeline)
    bottlenecks ← IdentifyBottlenecks(timeline)
    
    // Step 3: Anomaly Review
    anomalies ← []
    FOR EACH event IN execution_trace:
        IF event.type IN {CONFLICT, ESCALATION, RETRY, CHALLENGE, ERROR}:
            anomalies.APPEND(event)
    
    // Step 4: Causal Analysis (for failures or near-misses)
    IF NOT success OR quality_score < threshold:
        root_causes ← RootCauseAnalysis(anomalies, timeline, task_result)
        corrective_actions ← GenerateCorrectiveActions(root_causes)
    ELSE:
        // Even for successes, analyze near-misses
        near_misses ← FILTER anomalies WHERE resolved_without_failure = TRUE
        IF near_misses NOT EMPTY:
            preventive_actions ← AnalyzeNearMisses(near_misses)
    
    // Step 5: Lessons Extraction
    lessons ← ExtractLessons(
        anomalies, bottlenecks, 
        successful_strategies = IdentifyEffectivePatterns(timeline),
        failed_strategies = IdentifyFailedPatterns(timeline)
    )
    
    // Step 6: Memory Write-Back
    WriteBackEpisodicMemory(lessons, task_result, team, memory_policy)
    
    // Step 7: Metric Update
    UpdateAgentReliabilityScores(team, task_result, timeline)
    UpdateTeamPerformanceMetrics(team, task_result, timeline)
    
    // Step 8: Policy Refinement (if warranted)
    IF corrective_actions CONTAINS policy_changes:
        ProposePoliocyUpdates(corrective_actions, orchestrator)
    
    RETURN DebriefReport(success, quality_score, anomalies, lessons, corrective_actions)

17.12.6 Swiss Cheese Model for Agent Failure Defense#

Borrowing from James Reason's accident causation model, agent team reliability is achieved through multiple independent defense layers, each imperfect but collectively robust:

P(system failure)=l=1LP(layer l fails)P(\text{system failure}) = \prod_{l=1}^{L} P(\text{layer } l \text{ fails})
Defense LayerMechanismFailure Mode Addressed
1. Input ValidationSchema enforcement, constraint checkingMalformed or adversarial inputs
2. Agent Self-CheckChain-of-thought verification, confidence scoringHallucination, reasoning errors
3. Peer ReviewVerifier agent cross-checks implementer outputSystematic model bias
4. ConsensusMulti-agent voting or debateIndividual agent failure
5. Automated TestingTest harness execution against known casesFunctional correctness
6. Human OversightHITL/HOTL review for high-risk decisionsNovel failure modes, value alignment
7. Post-Deployment MonitoringProduction regression detection, anomaly alertingDrift, environmental changes

If each layer independently catches 90% of errors passing through it:

P(undetected error)=(0.1)7=107P(\text{undetected error}) = (0.1)^7 = 10^{-7}

In practice, layers are not perfectly independent, but even partial independence provides substantial reliability amplification.

17.12.7 Operational Readiness Levels#

Inspired by NASA's Technology Readiness Levels (TRLs), define Team Operational Readiness Levels (TORLs):

TORLDescriptionCriteria
1ConceptTeam roles and protocols defined on paper
2Component TestingIndividual agents validated in isolation
3Integration TestingHandoffs, consensus, and conflict resolution tested with synthetic tasks
4Simulated OperationsFull team operates on realistic workloads in staging environment
5Supervised ProductionTeam operates on production tasks with mandatory human review (HITL)
6Monitored ProductionTeam operates autonomously with human monitoring (HOTL) and automatic escalation
7Full AutonomyTeam operates within well-defined bounds without routine human intervention; human escalation only for edge cases

Teams advance through TORLs based on measured performance against quality gates at each level, never by fiat or optimism.


Chapter Summary#

Agent team coordination is a systems engineering discipline, not a prompt engineering exercise. This chapter has formalized:

  1. Organizational structure — teams as typed tuples with explicit roles, authority, obligations, and accountability ledgers
  2. Formation — static, dynamic, and capability-matched team assembly with formal optimization models
  3. Shared mental models — synchronized, version-controlled, token-efficient context sharing
  4. Handoffs — three-phase commit protocols with context summarization and responsibility chains
  5. Consensus — majority voting, weighted voting, structured debate, and arbitration with selection criteria
  6. Conflict resolution — evidence-based arbitration with formal quality scoring and escalation ladders
  7. Team memory — four-tier architecture with validated write-back, garbage collection, and promotion pipelines
  8. Load balancing — capacity models, assignment strategies, rebalancing protocols, and backpressure mechanisms
  9. Performance metrics — throughput, quality, coordination overhead, and efficiency with diagnostic anti-pattern detection
  10. Adaptive composition — runtime phase detection and cost-justified role reassignment
  11. Human integration — HITL/HOTL/human-as-tool protocols with trust calibration via Beta distributions
  12. HRO principles — CRM briefing/debriefing, challenge protocols, Swiss Cheese defense layers, and operational readiness levels

The unifying principle: coordination reliability is not emergent; it is engineered. Every communication channel is typed. Every handoff is verified. Every decision is traceable. Every failure is attributable. Every lesson is captured. The team that executes reliably at scale is the team whose coordination substrate was designed with the same rigor as its individual agents.