PocketClawvol. 1 · 2026
guide #108

Self-hosted AI security playbook 2026 — the practical operator's guide

Editorial noteThis article reports on a fast-moving space. Versions, install counts and timelines are accurate as of the “updated” date above. We re-verify against primary sources (CVE database, project repositories, vendor announcements) before each update. Send corrections to contact@pocketclaw.dev.

Problem
Self-hosted AI agent security guidance in 2026 is fragmented across vendor docs, Twitter threads and post-incident blog posts. There is no single practical operator's playbook covering the realistic threat model and concrete defences.

Solution
A 10-section playbook covering the modern threat model, sandbox setup, credential management, network isolation, audit logging, monitoring, update strategy, and incident response — with concrete commands and config snippets for Hermes Agent, OpenClaw 2026.4+, ZeroClaw and the rest.

Who this playbook is for

Anyone running a self-hosted AI agent in production — paid or hobbyist — in 2026. We assume you have access to your host (VPS or local hardware), can edit config files, and can run shell commands. We do not assume security expertise; we explain the why before the what where it matters.

We also assume you've read the basics of self-hosted AI agent architecture. If you haven't, [the landscape report](/guides/self-hosted-ai-landscape-2026) and [the OpenClaw crisis explainer](/guides/openclaw-security-crisis-2026) are good starting points.

This playbook is opinionated. We tell you what we'd do — not what's “the only way.”

Section 1 — The realistic threat model

The post-OpenClaw-crisis threat model for a self-hosted AI agent in 2026 includes:

  • Web-origin attacks. Malicious websites attempting to connect to
  • Prompt injection. Adversarial input embedded in documents,
  • Supply chain compromises. Plugins, MCP servers and pre-built tool
  • Credential exfiltration. API keys, SSH keys, browser cookies and
  • Tool execution privilege escalation. Tools that legitimately need
  • Lateral movement. A compromised agent host being used as a

What we are NOT defending against in this playbook:

  • Sophisticated state-level attackers with zero-day capabilities. If
  • Physical access to the host. If someone is at your keyboard, the
  • Complete model alignment failures. We assume the LLM behind your

The playbook covers everything else.

Section 2 — Pick the right agent for your threat model

Before any configuration, the first security decision is which agent you run. Different agents have different default postures.

AgentSandbox-on defaultAuth-on defaultThreat model publishedSuitability
OpenClaw 2026.4+yesyesyesAll workloads
Hermes AgentyesyesyesAll workloads
NanobotnononoSingle-user only
NanoClawyesyesyesmacOS-only
IronClawyes (gVisor)yes (RBAC)yes (formal)Regulated industries
ZeroClawyesoptionalyesPrivacy-mandated

If your security baseline requires sandbox-on-by-default and a documented threat model, your viable choices in mid-2026 are: Hermes Agent (general), OpenClaw 2026.4+ (existing deployments), NanoClaw (macOS), IronClaw (regulated), ZeroClaw (privacy-mandated).

If you're choosing today and you don't have a specific reason to do otherwise, default to Hermes Agent.

Section 3 — Sandbox setup

Tool execution must run inside a sandbox. We don't do “sandboxes are nice to have” in 2026. The CVE-2026-25253 incident closed that conversation.

3.1 The sandbox tiers

In order of strength:

1. gVisor (used by IronClaw): syscall interposition in user space. Strongest practical option for adversarial workloads. 2. Apple containers (used by NanoClaw): macOS-native, kernel-level enforcement, very strong. 3. Docker with seccomp profile (used by Hermes, OpenClaw 2026.4+): default but solid for non-adversarial sandboxing. 4. Cloudflare Workers runtime (used by Moltworker): V8 isolates, genuinely good for what it does, runtime constraints limit what tools can run. 5. Vanilla Docker without a tightened seccomp profile: better than nothing, weak.

For most pocket AI deployments, Hermes Agent's Docker + seccomp default is the realistic operating point. For high-stakes work, move to IronClaw or run Docker with a custom seccomp profile.

3.2 Sandbox config — the Hermes Agent example

Hermes ships with a default sandbox block that enforces: - Network egress: deny all by default; explicit allowlist required - Filesystem read: /workspace only by default - Filesystem write: nothing by default; explicit declaration required - Resource limits: 50% CPU quota, 256 MB memory by default

Per-tool overrides are possible but require explicit declaration in the tool's YAML. Sample:

name: web-fetch
description: Fetch a URL and return the content.
command: curl -sL
args:
  - url
sandbox:
  network:
    allow:
      - "https://*.allowed-domain.com/*"
      - "https://api.openrouter.ai/*"
  filesystem:
    read: ["/workspace"]
    write: []
  resources:
    cpu_quota: 30
    memory_mb: 128

The default policy denies; the YAML declares specific exceptions.

3.3 What to NEVER do

  • Disable the sandbox “just to test something” in
  • Allow filesystem write outside /workspace without a clear reason.
  • Allow unrestricted network egress (the “everything”
  • Set cpu_quota: 100 and memory_mb to half the host's RAM. A

Section 4 — Credential storage

Credentials for LLM providers, MCP servers, external APIs and similar must NEVER live in plaintext on the agent host's filesystem.

4.1 The right place: OS keyring

Use the operating system's credential store: - macOS: Keychain - Linux desktop: GNOME Keyring or KWallet - Linux server: pass with GPG, or HashiCorp Vault for serious setups

Hermes Agent's vault feature uses the OS keyring on Linux desktop, falls back to encrypted file with master key on a headless server. The master key file should be mode 0400, owner-only readable.

4.2 The wrong place: plaintext config files

CVE-2026-25103 (OpenClaw 2026.2 plaintext credential storage) is the canonical bad pattern. Don't repeat it. If you use an agent that defaults to plaintext credentials, override it before any production use.

4.3 Rotation policy

Rotate API keys at least every 90 days. Rotate immediately on: - Suspected agent compromise - Migration between hosts - Departure of anyone with access to the agent

Rotation is annoying. Build it into your monthly maintenance window. The annoyance of rotating beats the annoyance of explaining to a finance team why the Anthropic bill is €4,000 this month because someone stole the API key.

Section 5 — Network isolation

The agent dashboard must NOT be accessible from the public internet. This is non-negotiable in 2026.

5.1 The right way: Tailscale

Tailscale provides identity-based mesh networking. Install Tailscale on the agent host, install on your laptop, access the dashboard over the Tailscale IP. Done.

The agent dashboard binds to 127.0.0.1. Tailscale forwards connections from authorised devices. Access from the public internet is structurally impossible.

5.2 The acceptable way: SSH tunnel

If you can't install Tailscale (corporate restrictions, etc.):

ssh -L 8765:localhost:8765 user@your-agent-host

Then access http://localhost:8765 on your laptop. Same security property: dashboard never touches the public internet.

5.3 The wrong way: public exposure with auth

Even with auth on, the dashboard publicly exposed is a constant attack surface. Origin-bypass attacks, credential-stuffing, zero-day auth bypasses — anything you can think of, someone is trying it. Don't do it.

5.4 Egress allowlist

Equally important: where the agent can REACH from inside the host.

Default policy: deny all. Whitelist specific destinations: - Your LLM provider (api.anthropic.com, api.openai.com, openrouter.ai) - Your authorised tool destinations - Software update servers if needed

ZeroClaw makes this trivial (egress denied at iptables level). With other agents, the sandbox network allowlist plus host-level firewall rules give you the same effect.

Section 6 — Audit logging

Every tool call, every credential access, every dashboard login — log it.

6.1 What to log

  • Tool name + arguments hash + caller (user or agent)
  • Timestamp (timezone-aware)
  • LLM-source (which provider / model produced the call)
  • Approval flow result (approved by whom, when)
  • Result hash (did the tool succeed)

Don't log credential values. Don't log full tool outputs (they may contain sensitive data). Hash everything for later forensics without leaking content.

6.2 Where to log

Local rotated logs are the baseline (/var/log/ with logrotate). Tamper-evident logs are the goal: hash-chained, append-only. IronClaw ships this; for other agents, you implement it yourself or use a remote log shipper (Vector, Fluent Bit) to a separate host.

6.3 What to do with logs

  • Review weekly during low-activity weeks. Look for unfamiliar tool
  • Set up alerts for specific events: any shell-tool call after
  • Retain logs for at least 90 days. Some compliance regimes require

Section 7 — Monitoring

You can't react to incidents you don't see.

7.1 The minimum viable stack

  • Process supervision: systemd or Docker with restart: always.
  • System metrics: Netdata is the easiest path to comprehensive
  • Agent health: each agent we cover exposes a /health endpoint.
  • Alerts on critical events: at minimum, alert on agent crash, on

7.2 What to alert on

  • Agent process down for more than 60 seconds
  • CPU pegged at >90% for more than 5 minutes (legitimate heavy use is
  • Memory pressure pushing the agent toward OOM
  • Disk usage above 80%
  • Failed authentication attempts on the dashboard
  • Tool calls outside business hours if your workflow is business-hours-bound
  • Egress traffic to destinations not on your allowlist

7.3 Where to send alerts

Email works. Pushover, Telegram, Slack channels work. Whatever you look at every day. Don't send to a channel you'll never check.

Section 8 — Update strategy

Patches matter. Most published CVEs in 2026 had patches available within 72 hours; the breaches happened to people who hadn't updated.

8.1 The realistic strategy

Two-tier:

  • Critical security patches: apply within 48 hours. Watchtower or
  • Other updates: monthly maintenance window. Test, deploy, verify,

For Hermes Agent specifically: subscribe to the GitHub release feed and the security advisory feed. Watch for releases tagged “security” — these mean act fast.

8.2 The unrealistic strategy

“Auto-update everything always.” This breaks production when an update has a regression. We've watched it happen. Have a maintenance window.

8.3 The really unrealistic strategy

“I'll update when I have time.” Every CVE feed reader has this person on it. Don't be that person.

Section 9 — Incident response

Eventually something will go wrong. Have a plan.

9.1 The five things to do when you suspect compromise

1. Isolate. Pull the agent off the network. Tailscale revocation takes 5 seconds. iptables drop is faster. 2. Preserve. Take a snapshot of the host (filesystem, running processes, audit logs) before doing anything destructive. 3. Rotate. Every credential the agent had access to, even ones you're not sure about. Treat all as compromised. 4. Investigate. Audit logs first. Then process tree. Then network logs. Build a timeline of what the agent did between “normal” and “weird.” 5. Rebuild. When in doubt, nuke the host and rebuild from a trusted image. Restore data from backup. Don't try to clean a compromised host in place.

9.2 The 30-day post-incident

After the immediate response:

  • Write a post-mortem. Honest. What happened, why, what's changed.
  • Share within your team or — if you run something with public users
  • Update your defences. Whatever path the attacker took, that path is
  • Subscribe to the relevant CVE feeds if you weren't already.

Section 10 — Backup and recovery

The often-forgotten security control.

10.1 What to back up

  • Agent config files
  • Conversation history (if you depend on it)
  • MCP server configurations
  • Any local-state databases (vector DB, SQLite, etc.)
  • Encrypted credential vault (yes, including the encrypted vault — you

10.2 How

  • borgbackup or restic to a remote endpoint. Encrypted, deduplicated.
  • Frequency: nightly for live data, weekly for full system snapshots
  • Off-site is non-negotiable. A backup on the same VPS dies with

10.3 Test the restore

Once a quarter, restore your backup to a fresh host and verify the agent comes up. The number of self-hosters who have backups they've never tested is depressing. If you don't test, you don't have a backup — you have a hope.

Section 11 — The 12-step quick checklist

Every self-hosted AI agent in 2026 should have:

1. Sandbox-on by default for tool execution 2. Auth-on by default for the dashboard 3. Dashboard accessible only via Tailscale or SSH tunnel (never public) 4. Credentials stored in OS keyring or encrypted vault 5. Egress denied by default with explicit allowlist 6. Audit logging on, weekly review 7. Process supervision with auto-restart 8. System metrics collection (Netdata or equivalent) 9. Alerts for crashes, CPU saturation, suspicious egress 10. Critical security patches applied within 48 hours 11. Backups nightly, off-site, tested quarterly 12. Incident response playbook written before you need it

If you can't tick all 12 in your current setup, fix the gaps before adding features.

Section 12 — Closing notes

Self-hosted AI security in 2026 is more disciplined hygiene than exotic skill. The mistakes that matter (CVE-2026-25253, CVE-2026-25103 plaintext credentials, the chronic tendency to expose dashboards on public IPs) are mistakes we already know how to avoid in other software. The novelty of the agent makes it tempting to skip the basics. Don't.

The good news: every credible self-hosted agent in 2026 ships with better defaults than the equivalent project would have shipped 18 months ago. The post-OpenClaw-crisis ecosystem is meaningfully more secure by default. Your job as an operator is to not actively undo that.

Subscribe to [the newsletter](/newsletter) for security alerts when they happen, the [CVE tracker](/cves) for the live feed, and the [methodology page](/methodology) for our standard security audit checklist.

Related guides

  • [The complete OpenClaw timeline](/guides/openclaw-complete-history)
  • [OpenClaw security crisis 2026](/guides/openclaw-security-crisis-2026)
  • [5 best OpenClaw alternatives](/guides/openclaw-alternatives-2026)
  • [Pocket AI complete guide](/guides/pocket-ai-complete-guide)
  • [Edge AI hardware buyer's guide 2026](/guides/edge-ai-hardware-2026)
Continue reading
guide
Pocket AI complete guide
Running self-hosted AI on portable hardware
guide
Edge AI hardware buyer's guide 2026
Pi 5 vs Mini PC vs Mac Mini
report
Self-hosted AI landscape 2026
Quarterly state of the ecosystem
section
Pocket AI hardware hub
All portable hosts reviewed
section
Agent tracker
Live stats on every agent
newsletter
Thursday digest
Weekly summary in your inbox