Self-hosted AI security playbook 2026 — the practical operator's guide

Who this playbook is for

Anyone running a self-hosted AI agent in production — paid or hobbyist — in 2026. We assume you have access to your host (VPS or local hardware), can edit config files, and can run shell commands. We do not assume security expertise; we explain the why before the what where it matters.

We also assume you've read the basics of self-hosted AI agent architecture. If you haven't, [the landscape report](/guides/self-hosted-ai-landscape-2026) and [the OpenClaw crisis explainer](/guides/openclaw-security-crisis-2026) are good starting points.

This playbook is opinionated. We tell you what we'd do — not what's “the only way.”

Section 1 — The realistic threat model

The post-OpenClaw-crisis threat model for a self-hosted AI agent in 2026 includes:

Web-origin attacks. Malicious websites attempting to connect to
Prompt injection. Adversarial input embedded in documents,
Supply chain compromises. Plugins, MCP servers and pre-built tool
Credential exfiltration. API keys, SSH keys, browser cookies and
Tool execution privilege escalation. Tools that legitimately need
Lateral movement. A compromised agent host being used as a

What we are NOT defending against in this playbook:

Sophisticated state-level attackers with zero-day capabilities. If
Physical access to the host. If someone is at your keyboard, the
Complete model alignment failures. We assume the LLM behind your

The playbook covers everything else.

Section 2 — Pick the right agent for your threat model

Before any configuration, the first security decision is which agent you run. Different agents have different default postures.

Agent	Sandbox-on default	Auth-on default	Threat model published	Suitability
OpenClaw 2026.4+	yes	yes	yes	All workloads
Hermes Agent	yes	yes	yes	All workloads
Nanobot	no	no	no	Single-user only
NanoClaw	yes	yes	yes	macOS-only
IronClaw	yes (gVisor)	yes (RBAC)	yes (formal)	Regulated industries
ZeroClaw	yes	optional	yes	Privacy-mandated

If your security baseline requires sandbox-on-by-default and a documented threat model, your viable choices in mid-2026 are: Hermes Agent (general), OpenClaw 2026.4+ (existing deployments), NanoClaw (macOS), IronClaw (regulated), ZeroClaw (privacy-mandated).

If you're choosing today and you don't have a specific reason to do otherwise, default to Hermes Agent.

Section 3 — Sandbox setup

Tool execution must run inside a sandbox. We don't do “sandboxes are nice to have” in 2026. The CVE-2026-25253 incident closed that conversation.

3.1 The sandbox tiers

In order of strength:

1. gVisor (used by IronClaw): syscall interposition in user space. Strongest practical option for adversarial workloads. 2. Apple containers (used by NanoClaw): macOS-native, kernel-level enforcement, very strong. 3. Docker with seccomp profile (used by Hermes, OpenClaw 2026.4+): default but solid for non-adversarial sandboxing. 4. Cloudflare Workers runtime (used by Moltworker): V8 isolates, genuinely good for what it does, runtime constraints limit what tools can run. 5. Vanilla Docker without a tightened seccomp profile: better than nothing, weak.

For most pocket AI deployments, Hermes Agent's Docker + seccomp default is the realistic operating point. For high-stakes work, move to IronClaw or run Docker with a custom seccomp profile.

3.2 Sandbox config — the Hermes Agent example

Hermes ships with a default sandbox block that enforces: - Network egress: deny all by default; explicit allowlist required - Filesystem read: /workspace only by default - Filesystem write: nothing by default; explicit declaration required - Resource limits: 50% CPU quota, 256 MB memory by default

Per-tool overrides are possible but require explicit declaration in the tool's YAML. Sample:

name: web-fetch
description: Fetch a URL and return the content.
command: curl -sL
args:
  - url
sandbox:
  network:
    allow:
      - "https://*.allowed-domain.com/*"
      - "https://api.openrouter.ai/*"
  filesystem:
    read: ["/workspace"]
    write: []
  resources:
    cpu_quota: 30
    memory_mb: 128

The default policy denies; the YAML declares specific exceptions.

3.3 What to NEVER do

Disable the sandbox “just to test something” in
Allow filesystem write outside /workspace without a clear reason.
Allow unrestricted network egress (the “everything”
Set cpu_quota: 100 and memory_mb to half the host's RAM. A

Section 4 — Credential storage

Credentials for LLM providers, MCP servers, external APIs and similar must NEVER live in plaintext on the agent host's filesystem.

4.1 The right place: OS keyring

Use the operating system's credential store: - macOS: Keychain - Linux desktop: GNOME Keyring or KWallet - Linux server: pass with GPG, or HashiCorp Vault for serious setups

Hermes Agent's vault feature uses the OS keyring on Linux desktop, falls back to encrypted file with master key on a headless server. The master key file should be mode 0400, owner-only readable.

4.2 The wrong place: plaintext config files

CVE-2026-25103 (OpenClaw 2026.2 plaintext credential storage) is the canonical bad pattern. Don't repeat it. If you use an agent that defaults to plaintext credentials, override it before any production use.

4.3 Rotation policy

Rotate API keys at least every 90 days. Rotate immediately on: - Suspected agent compromise - Migration between hosts - Departure of anyone with access to the agent

Rotation is annoying. Build it into your monthly maintenance window. The annoyance of rotating beats the annoyance of explaining to a finance team why the Anthropic bill is €4,000 this month because someone stole the API key.

Section 5 — Network isolation

The agent dashboard must NOT be accessible from the public internet. This is non-negotiable in 2026.

5.1 The right way: Tailscale

Tailscale provides identity-based mesh networking. Install Tailscale on the agent host, install on your laptop, access the dashboard over the Tailscale IP. Done.

The agent dashboard binds to 127.0.0.1. Tailscale forwards connections from authorised devices. Access from the public internet is structurally impossible.

5.2 The acceptable way: SSH tunnel

If you can't install Tailscale (corporate restrictions, etc.):

ssh -L 8765:localhost:8765 user@your-agent-host

Then access http://localhost:8765 on your laptop. Same security property: dashboard never touches the public internet.

5.3 The wrong way: public exposure with auth

Even with auth on, the dashboard publicly exposed is a constant attack surface. Origin-bypass attacks, credential-stuffing, zero-day auth bypasses — anything you can think of, someone is trying it. Don't do it.

5.4 Egress allowlist

Equally important: where the agent can REACH from inside the host.

Default policy: deny all. Whitelist specific destinations: - Your LLM provider (api.anthropic.com, api.openai.com, openrouter.ai) - Your authorised tool destinations - Software update servers if needed

ZeroClaw makes this trivial (egress denied at iptables level). With other agents, the sandbox network allowlist plus host-level firewall rules give you the same effect.

Section 6 — Audit logging

Every tool call, every credential access, every dashboard login — log it.

6.1 What to log

Tool name + arguments hash + caller (user or agent)
Timestamp (timezone-aware)
LLM-source (which provider / model produced the call)
Approval flow result (approved by whom, when)
Result hash (did the tool succeed)

Don't log credential values. Don't log full tool outputs (they may contain sensitive data). Hash everything for later forensics without leaking content.

6.2 Where to log

Local rotated logs are the baseline (/var/log/ with logrotate). Tamper-evident logs are the goal: hash-chained, append-only. IronClaw ships this; for other agents, you implement it yourself or use a remote log shipper (Vector, Fluent Bit) to a separate host.

6.3 What to do with logs

Review weekly during low-activity weeks. Look for unfamiliar tool
Set up alerts for specific events: any shell-tool call after
Retain logs for at least 90 days. Some compliance regimes require

Section 7 — Monitoring

You can't react to incidents you don't see.

7.1 The minimum viable stack

Process supervision: systemd or Docker with restart: always.
System metrics: Netdata is the easiest path to comprehensive
Agent health: each agent we cover exposes a /health endpoint.
Alerts on critical events: at minimum, alert on agent crash, on

7.2 What to alert on

Agent process down for more than 60 seconds
CPU pegged at >90% for more than 5 minutes (legitimate heavy use is
Memory pressure pushing the agent toward OOM
Disk usage above 80%
Failed authentication attempts on the dashboard
Tool calls outside business hours if your workflow is business-hours-bound
Egress traffic to destinations not on your allowlist

7.3 Where to send alerts

Email works. Pushover, Telegram, Slack channels work. Whatever you look at every day. Don't send to a channel you'll never check.

Section 8 — Update strategy

Patches matter. Most published CVEs in 2026 had patches available within 72 hours; the breaches happened to people who hadn't updated.

8.1 The realistic strategy

Two-tier:

Critical security patches: apply within 48 hours. Watchtower or
Other updates: monthly maintenance window. Test, deploy, verify,

For Hermes Agent specifically: subscribe to the GitHub release feed and the security advisory feed. Watch for releases tagged “security” — these mean act fast.

8.2 The unrealistic strategy

“Auto-update everything always.” This breaks production when an update has a regression. We've watched it happen. Have a maintenance window.

8.3 The really unrealistic strategy

“I'll update when I have time.” Every CVE feed reader has this person on it. Don't be that person.

Section 9 — Incident response

Eventually something will go wrong. Have a plan.

9.1 The five things to do when you suspect compromise

1. Isolate. Pull the agent off the network. Tailscale revocation takes 5 seconds. iptables drop is faster. 2. Preserve. Take a snapshot of the host (filesystem, running processes, audit logs) before doing anything destructive. 3. Rotate. Every credential the agent had access to, even ones you're not sure about. Treat all as compromised. 4. Investigate. Audit logs first. Then process tree. Then network logs. Build a timeline of what the agent did between “normal” and “weird.” 5. Rebuild. When in doubt, nuke the host and rebuild from a trusted image. Restore data from backup. Don't try to clean a compromised host in place.

9.2 The 30-day post-incident

After the immediate response:

Write a post-mortem. Honest. What happened, why, what's changed.
Share within your team or — if you run something with public users
Update your defences. Whatever path the attacker took, that path is
Subscribe to the relevant CVE feeds if you weren't already.

Section 10 — Backup and recovery

The often-forgotten security control.

10.1 What to back up

Agent config files
Conversation history (if you depend on it)
MCP server configurations
Any local-state databases (vector DB, SQLite, etc.)
Encrypted credential vault (yes, including the encrypted vault — you

10.2 How

borgbackup or restic to a remote endpoint. Encrypted, deduplicated.
Frequency: nightly for live data, weekly for full system snapshots
Off-site is non-negotiable. A backup on the same VPS dies with

10.3 Test the restore

Once a quarter, restore your backup to a fresh host and verify the agent comes up. The number of self-hosters who have backups they've never tested is depressing. If you don't test, you don't have a backup — you have a hope.

Section 11 — The 12-step quick checklist

Every self-hosted AI agent in 2026 should have:

1. Sandbox-on by default for tool execution 2. Auth-on by default for the dashboard 3. Dashboard accessible only via Tailscale or SSH tunnel (never public) 4. Credentials stored in OS keyring or encrypted vault 5. Egress denied by default with explicit allowlist 6. Audit logging on, weekly review 7. Process supervision with auto-restart 8. System metrics collection (Netdata or equivalent) 9. Alerts for crashes, CPU saturation, suspicious egress 10. Critical security patches applied within 48 hours 11. Backups nightly, off-site, tested quarterly 12. Incident response playbook written before you need it

If you can't tick all 12 in your current setup, fix the gaps before adding features.

Section 12 — Closing notes

Self-hosted AI security in 2026 is more disciplined hygiene than exotic skill. The mistakes that matter (CVE-2026-25253, CVE-2026-25103 plaintext credentials, the chronic tendency to expose dashboards on public IPs) are mistakes we already know how to avoid in other software. The novelty of the agent makes it tempting to skip the basics. Don't.

The good news: every credible self-hosted agent in 2026 ships with better defaults than the equivalent project would have shipped 18 months ago. The post-OpenClaw-crisis ecosystem is meaningfully more secure by default. Your job as an operator is to not actively undo that.

Subscribe to [the newsletter](/newsletter) for security alerts when they happen, the [CVE tracker](/cves) for the live feed, and the [methodology page](/methodology) for our standard security audit checklist.

Related guides

[The complete OpenClaw timeline](/guides/openclaw-complete-history)
[OpenClaw security crisis 2026](/guides/openclaw-security-crisis-2026)
[5 best OpenClaw alternatives](/guides/openclaw-alternatives-2026)
[Pocket AI complete guide](/guides/pocket-ai-complete-guide)
[Edge AI hardware buyer's guide 2026](/guides/edge-ai-hardware-2026)