How We Built a Three-Tier Claude Supervision Architecture for Autonomous Local AI

The Problem: Who Watches the Watcher?

Guaardvark's AI agents are not chat assistants that wait for human approval at every step. They are autonomous actors that read files, write code, execute shell commands, manage dependencies, and propagate changes across networks of devices. The self-improvement engine runs pytest, dispatches ReACT agents to write fixes, validates patches, and broadcasts them fleet-wide — all without a human in the loop. This level of autonomy is the entire point of the platform, but it creates a fundamental supervision problem.

The local models running on Ollama handle the vast majority of day-to-day reasoning and tool use. But some decisions are genuinely hard: evaluating whether a proposed security patch introduces a subtle vulnerability, deciding whether a refactor changes observable behavior, or assessing whether an agent's reasoning about a complex stack trace is sound. Local models operating at the 7B-70B parameter range can miss edge cases that a frontier model would catch. We needed a way to escalate these hard problems to a more capable reviewer without making the entire system dependent on cloud availability.

At the same time, we needed to solve for air-gapped deployments. Many Guaardvark installations run on isolated networks where internet access is intermittent or entirely absent. Any supervision architecture that blocks on an API call would render those deployments inoperable. The system had to be offline-safe by default, with cloud supervision as an enhancement rather than a requirement.

The Three-Tier Architecture


    +---------------------------------------------+
    |          TIER 1: ESCALATION                  |
    |   Route hard problems to Claude API          |
    |   Budget-controlled, conversation-aware      |
    +----------------------+----------------------+
                           |
                           v
    +---------------------------------------------+
    |          TIER 2: CODE GUARDIAN               |
    |   Review every autonomous code change        |
    |   Six directive levels (proceed → halt)      |
    +----------------------+----------------------+
                           |
                           v
    +---------------------------------------------+
    |          TIER 3: KILL SWITCHES               |
    |   Emergency directives across the fleet      |
    |   Soft gates → hard stops → fleet halt       |
    +---------------------------------------------+

The architecture is layered intentionally. Tier 1 handles the intelligence gap between local and frontier models. Tier 2 enforces safety policy on every code change, regardless of which model produced it. Tier 3 provides emergency override capabilities that propagate across every device in the fleet. Each tier operates independently — if Tier 1 is unavailable because the API is unreachable, Tiers 2 and 3 still function. If Tier 2 approves a change that later proves dangerous, Tier 3 can halt the entire fleet retroactively.

The three tiers are implemented as a single Python library, guaardvark-guardian, that any component of the platform can import and use with a consistent API.

Tier 1: Escalation — Route Hard Problems to Claude

The escalation tier provides a structured interface for any Guaardvark component to send a reasoning problem to Claude when the local model's confidence is low or the stakes are high. This is not a general-purpose chat interface — it is a supervised escalation channel with budget controls, conversation context, and graceful degradation.

result = guardian.escalate(
    message="Explain this stack trace",
    conversation_history=[...],
    system_context="Running PostgreSQL 16 on Ubuntu 24.04",
)
if result.available:
    print(result.response)
else:
    print(f"Unavailable: {result.reason}")

Every escalation call is budget-controlled. The guardian tracks cumulative API spend against a configurable monthly ceiling. When the budget is exhausted, escalation requests return an unavailable result with a clear reason, and the calling component falls back to the local model's judgment. The budget resets automatically on a monthly cycle and is tracked with thread-safe atomic operations so concurrent agents do not cause race conditions.

Escalation is conversation-aware. The calling component can pass the full conversation history and a system context string that describes the deployment environment. Claude receives enough context to reason about the problem without the calling code needing to re-explain the entire situation. This is critical for self-improvement agents that have been reasoning through a multi-step debugging session and need a second opinion at a specific decision point.

Most importantly, escalation never blocks. If the Anthropic API is unreachable — whether due to network issues, an air-gapped deployment, or a service outage — the call returns immediately with result.available = False. The calling component is responsible for deciding what to do in that case, but the guardian ensures it always gets a response. There is no timeout that freezes the agent. There is no retry loop that burns cycles. The system keeps moving.

Tier 2: Code Guardian — Review Every Autonomous Change

This is the critical safety layer. Every code change proposed by an autonomous agent passes through the code guardian before it touches the filesystem. The guardian evaluates the change and returns one of six directives that tell the calling system exactly how to proceed. These directives form a graduated scale from full approval to fleet-wide emergency stop.

class Directive(str, Enum):
    PROCEED = "proceed"                       # Safe
    PROCEED_WITH_CAUTION = "proceed_with_caution"  # Monitor
    REJECT = "reject"                         # Do not apply
    HALT_SELF_IMPROVEMENT = "halt_self_improvement"  # Stop autonomous changes
    LOCK_CODEBASE = "lock_codebase"           # Prevent ALL writes
    HALT_FAMILY = "halt_family"               # Fleet-wide emergency stop

The first three directives handle normal operation. PROCEED means the change is safe and can be applied immediately. PROCEED_WITH_CAUTION means the change is likely safe but should be flagged for later human review — this is the default when the API is unreachable and the guardian cannot perform a full assessment. REJECT means the change should not be applied: the guardian identified a problem such as a security regression, a logic error, or a violation of coding standards.

The last three directives are escalation responses. HALT_SELF_IMPROVEMENT tells the system to stop all autonomous code changes but allow read operations and normal agent tasks to continue. LOCK_CODEBASE prevents all filesystem writes, effectively freezing the system state for forensic analysis. HALT_FAMILY broadcasts an emergency stop to every connected device in the fleet via the Interconnector mesh network.

The review interface accepts the file path, current content, proposed diff, and the agent's reasoning for the change:

review = guardian.review_change(
    file_path="core/auth.py",
    current_content="def verify_token(t): ...",
    proposed_diff="-def verify_token(t): ...\n+def verify_token(t, skip=True): ...",
    reasoning="Add bypass parameter for testing",
)
# Returns: approved, directive, risk_level, reason, suggestions

The return value is a structured object containing the boolean approved flag, the directive enum, a numeric risk level, a human-readable reason explaining the decision, and an optional list of suggestions for improving the change. This gives the calling component everything it needs to act on the review, log the decision, and present it to a human reviewer if needed. The review itself is performed by Claude when the API is available, with the full file context and diff provided so the model can reason about the semantic impact of the change rather than just its syntactic correctness.

Tier 3: Kill Switches — Emergency Directives Across the Fleet

The directive system is not just a return value — it is a trigger mechanism. When a code review returns a directive above REJECT, the guardian invokes a registered callback that the host application uses to execute the appropriate response. This callback pattern decouples the guardian library from the host application's specific shutdown procedures while ensuring that emergency directives are acted upon immediately.

def handle_directive(directive, review):
    if directive == Directive.HALT_SELF_IMPROVEMENT:
        disable_autonomous_agents()
    elif directive == Directive.LOCK_CODEBASE:
        set_filesystem_readonly()
    elif directive == Directive.HALT_FAMILY:
        broadcast_halt_to_fleet()

guardian = Guardian(
    api_key="sk-ant-...",
    on_directive=handle_directive,
)

The escalation pattern follows a deliberate gradient. Soft gates like PROCEED_WITH_CAUTION flag changes for later review but do not interrupt operation. Hard stops like HALT_SELF_IMPROVEMENT and LOCK_CODEBASE shut down specific capabilities on the local device. The fleet halt is the nuclear option: it broadcasts a stop signal to every device connected via the Interconnector mesh network, causing every device in the fleet to lock its codebase simultaneously.

Fleet-wide propagation uses the same Interconnector protocol that handles model weight synchronization, configuration updates, and patch broadcasting. When a device receives a HALT_FAMILY directive, it locks its own codebase, disables its autonomous agents, and re-broadcasts the halt to any peers it can reach. This ensures the signal propagates even in partially connected mesh topologies where not every device has a direct connection to the originator. The halt state persists until a human administrator explicitly clears it, ensuring that the fleet remains locked even if devices restart.

This three-level kill switch design means the system can respond proportionally. A single suspicious change gets rejected. A pattern of suspicious changes halts the self-improvement engine. A genuinely dangerous change locks the entire fleet. The response scales with the severity.

The Offline-Safe Design

Every tier of the guardian architecture is designed to operate when the Anthropic API is unreachable. This is not an edge case we handle grudgingly — it is a first-class design constraint driven by the reality that many Guaardvark deployments run in air-gapped environments where internet access is permanently unavailable.

When the API is unavailable, the escalation tier returns an unavailable result immediately. The code guardian returns PROCEED_WITH_CAUTION with an offline_fallback flag set to True. This flag marks the change as unreviewed by the frontier model, and the change is logged in the audit trail with a clear indication that it was applied without full guardian review. When connectivity is restored, the audit log provides a queue of changes that a human or the guardian can review retroactively.

The budget tracking system uses thread-safe atomic operations and persists its state to disk, so it survives process restarts and works correctly under concurrent access from multiple Celery workers. The monthly budget auto-resets based on the calendar, requiring no manual intervention. All of this runs locally — budget enforcement does not depend on the API being available.

This offline-safe design means that a Guaardvark installation on a submarine, a factory floor, a classified network, or a remote research station operates exactly the same as one with full internet access. The guardian still reviews changes, still enforces directives, and still maintains an audit trail. The only difference is that Tier 1 escalation and Tier 2 frontier-model review are unavailable until connectivity returns.

The Pending Fixes Queue

The guardian does not auto-apply changes. Every code modification proposed by an autonomous agent enters a pending fixes queue where it progresses through a defined lifecycle: proposed, triaged, approved, and finally applied. This staging model ensures that even approved changes go through an explicit application step rather than silently modifying the codebase.

Each entry in the queue carries a full audit record: the original file content, the proposed diff, the agent's reasoning, the guardian's review result (including directive, risk level, and suggestions), a timestamp, and the identity of the reviewer (whether that is Claude, a local model, or a human). When a human administrator reviews the pending queue through the dashboard, they see exactly what the agent intended, exactly what the guardian thought about it, and exactly why.

The queue also handles the offline case naturally. Changes applied with the offline_fallback flag are marked as provisionally applied and remain in the queue for retroactive review. This creates a clean audit trail even in air-gapped deployments where the guardian could not consult Claude at the time the change was made.

The Standalone Library: guaardvark-guardian

The guardian architecture is implemented as a standalone Python library called guaardvark-guardian, available under the MIT license. While it was designed for and extracted from the Guaardvark platform, it has no dependency on Guaardvark itself. Any Python project that uses autonomous AI agents to modify code can integrate the guardian with a single import.

pip install guaardvark-guardian

The library's only external dependency is the anthropic Python SDK, and even that is optional — if you do not provide an API key, the guardian operates in offline-only mode where all reviews return PROCEED_WITH_CAUTION with the offline fallback flag. The callback system for directive handling is entirely optional: if no callback is registered, directives are returned as data for the calling application to handle however it sees fit. This makes the library usable as a pure review layer with no side effects, or as a fully wired supervision system with automatic emergency responses.

Why This Matters

AI agents are gaining the ability to modify source code, manage infrastructure, and operate autonomously for extended periods. This is a powerful capability, but it demands supervision architectures that are as carefully engineered as the agents themselves. A single unsupervised code change in an authentication module, a logging pipeline, or a network configuration file can have consequences that cascade far beyond the original intent.

The three-tier guardian architecture is our contribution to the ecosystem's understanding of this problem. It demonstrates that supervision can be layered, graduated, offline-safe, and non-blocking. It shows that kill switches can be proportional rather than binary, and that fleet-wide emergency response can be built on the same mesh networking primitives used for routine synchronization. We believe these patterns will become critical infrastructure as autonomous AI agents move from experimental projects into production deployments, and we are publishing this architecture openly so that others can build on it, critique it, and improve it.