Who this playbook is for
Anyone running a self-hosted AI agent in production — paid or hobbyist — in 2026. We assume you have access to your host (VPS or local hardware), can edit config files, and can run shell commands. We do not assume security expertise; we explain the why before the what where it matters.
We also assume you've read the basics of self-hosted AI agent architecture. If you haven't, [the landscape report](/guides/self-hosted-ai-landscape-2026) and [the OpenClaw crisis explainer](/guides/openclaw-security-crisis-2026) are good starting points.
This playbook is opinionated. We tell you what we'd do — not what's “the only way.”
Section 1 — The realistic threat model
The post-OpenClaw-crisis threat model for a self-hosted AI agent in 2026 includes:
- Web-origin attacks. Malicious websites attempting to connect to
- Prompt injection. Adversarial input embedded in documents,
- Supply chain compromises. Plugins, MCP servers and pre-built tool
- Credential exfiltration. API keys, SSH keys, browser cookies and
- Tool execution privilege escalation. Tools that legitimately need
- Lateral movement. A compromised agent host being used as a
What we are NOT defending against in this playbook:
- Sophisticated state-level attackers with zero-day capabilities. If
- Physical access to the host. If someone is at your keyboard, the
- Complete model alignment failures. We assume the LLM behind your
The playbook covers everything else.
Section 2 — Pick the right agent for your threat model
Before any configuration, the first security decision is which agent you run. Different agents have different default postures.
| Agent | Sandbox-on default | Auth-on default | Threat model published | Suitability |
|---|---|---|---|---|
| OpenClaw 2026.4+ | yes | yes | yes | All workloads |
| Hermes Agent | yes | yes | yes | All workloads |
| Nanobot | no | no | no | Single-user only |
| NanoClaw | yes | yes | yes | macOS-only |
| IronClaw | yes (gVisor) | yes (RBAC) | yes (formal) | Regulated industries |
| ZeroClaw | yes | optional | yes | Privacy-mandated |
If your security baseline requires sandbox-on-by-default and a documented threat model, your viable choices in mid-2026 are: Hermes Agent (general), OpenClaw 2026.4+ (existing deployments), NanoClaw (macOS), IronClaw (regulated), ZeroClaw (privacy-mandated).
If you're choosing today and you don't have a specific reason to do otherwise, default to Hermes Agent.
Section 3 — Sandbox setup
Tool execution must run inside a sandbox. We don't do “sandboxes are nice to have” in 2026. The CVE-2026-25253 incident closed that conversation.
3.1 The sandbox tiers
In order of strength:
1. gVisor (used by IronClaw): syscall interposition in user space. Strongest practical option for adversarial workloads. 2. Apple containers (used by NanoClaw): macOS-native, kernel-level enforcement, very strong. 3. Docker with seccomp profile (used by Hermes, OpenClaw 2026.4+): default but solid for non-adversarial sandboxing. 4. Cloudflare Workers runtime (used by Moltworker): V8 isolates, genuinely good for what it does, runtime constraints limit what tools can run. 5. Vanilla Docker without a tightened seccomp profile: better than nothing, weak.
For most pocket AI deployments, Hermes Agent's Docker + seccomp default is the realistic operating point. For high-stakes work, move to IronClaw or run Docker with a custom seccomp profile.
3.2 Sandbox config — the Hermes Agent example
Hermes ships with a default sandbox block that enforces:
- Network egress: deny all by default; explicit allowlist required
- Filesystem read: /workspace only by default
- Filesystem write: nothing by default; explicit declaration required
- Resource limits: 50% CPU quota, 256 MB memory by default
Per-tool overrides are possible but require explicit declaration in the tool's YAML. Sample:
name: web-fetch
description: Fetch a URL and return the content.
command: curl -sL
args:
- url
sandbox:
network:
allow:
- "https://*.allowed-domain.com/*"
- "https://api.openrouter.ai/*"
filesystem:
read: ["/workspace"]
write: []
resources:
cpu_quota: 30
memory_mb: 128The default policy denies; the YAML declares specific exceptions.
3.3 What to NEVER do
- Disable the sandbox “just to test something” in
- Allow filesystem write outside
/workspacewithout a clear reason. - Allow unrestricted network egress (the “everything”
- Set
cpu_quota: 100andmemory_mbto half the host's RAM. A
Section 4 — Credential storage
Credentials for LLM providers, MCP servers, external APIs and similar must NEVER live in plaintext on the agent host's filesystem.
4.1 The right place: OS keyring
Use the operating system's credential store:
- macOS: Keychain
- Linux desktop: GNOME Keyring or KWallet
- Linux server: pass with GPG, or HashiCorp Vault for serious setups
Hermes Agent's vault feature uses the OS keyring on Linux desktop, falls back to encrypted file with master key on a headless server. The master key file should be mode 0400, owner-only readable.
4.2 The wrong place: plaintext config files
CVE-2026-25103 (OpenClaw 2026.2 plaintext credential storage) is the canonical bad pattern. Don't repeat it. If you use an agent that defaults to plaintext credentials, override it before any production use.
4.3 Rotation policy
Rotate API keys at least every 90 days. Rotate immediately on: - Suspected agent compromise - Migration between hosts - Departure of anyone with access to the agent
Rotation is annoying. Build it into your monthly maintenance window. The annoyance of rotating beats the annoyance of explaining to a finance team why the Anthropic bill is €4,000 this month because someone stole the API key.
Section 5 — Network isolation
The agent dashboard must NOT be accessible from the public internet. This is non-negotiable in 2026.
5.1 The right way: Tailscale
Tailscale provides identity-based mesh networking. Install Tailscale on the agent host, install on your laptop, access the dashboard over the Tailscale IP. Done.
The agent dashboard binds to 127.0.0.1. Tailscale forwards
connections from authorised devices. Access from the public internet
is structurally impossible.
5.2 The acceptable way: SSH tunnel
If you can't install Tailscale (corporate restrictions, etc.):
ssh -L 8765:localhost:8765 user@your-agent-hostThen access http://localhost:8765 on your laptop. Same security
property: dashboard never touches the public internet.
5.3 The wrong way: public exposure with auth
Even with auth on, the dashboard publicly exposed is a constant attack surface. Origin-bypass attacks, credential-stuffing, zero-day auth bypasses — anything you can think of, someone is trying it. Don't do it.
5.4 Egress allowlist
Equally important: where the agent can REACH from inside the host.
Default policy: deny all. Whitelist specific destinations:
- Your LLM provider (api.anthropic.com, api.openai.com,
openrouter.ai)
- Your authorised tool destinations
- Software update servers if needed
ZeroClaw makes this trivial (egress denied at iptables level). With other agents, the sandbox network allowlist plus host-level firewall rules give you the same effect.
Section 6 — Audit logging
Every tool call, every credential access, every dashboard login — log it.
6.1 What to log
- Tool name + arguments hash + caller (user or agent)
- Timestamp (timezone-aware)
- LLM-source (which provider / model produced the call)
- Approval flow result (approved by whom, when)
- Result hash (did the tool succeed)
Don't log credential values. Don't log full tool outputs (they may contain sensitive data). Hash everything for later forensics without leaking content.
6.2 Where to log
Local rotated logs are the baseline (/var/log/ with logrotate).
Tamper-evident logs are the goal: hash-chained, append-only. IronClaw
ships this; for other agents, you implement it yourself or use a
remote log shipper (Vector, Fluent Bit) to a separate host.
6.3 What to do with logs
- Review weekly during low-activity weeks. Look for unfamiliar tool
- Set up alerts for specific events: any shell-tool call after
- Retain logs for at least 90 days. Some compliance regimes require
Section 7 — Monitoring
You can't react to incidents you don't see.
7.1 The minimum viable stack
- Process supervision: systemd or Docker with
restart: always. - System metrics: Netdata is the easiest path to comprehensive
- Agent health: each agent we cover exposes a
/healthendpoint. - Alerts on critical events: at minimum, alert on agent crash, on
7.2 What to alert on
- Agent process down for more than 60 seconds
- CPU pegged at >90% for more than 5 minutes (legitimate heavy use is
- Memory pressure pushing the agent toward OOM
- Disk usage above 80%
- Failed authentication attempts on the dashboard
- Tool calls outside business hours if your workflow is business-hours-bound
- Egress traffic to destinations not on your allowlist
7.3 Where to send alerts
Email works. Pushover, Telegram, Slack channels work. Whatever you look at every day. Don't send to a channel you'll never check.
Section 8 — Update strategy
Patches matter. Most published CVEs in 2026 had patches available within 72 hours; the breaches happened to people who hadn't updated.
8.1 The realistic strategy
Two-tier:
- Critical security patches: apply within 48 hours. Watchtower or
- Other updates: monthly maintenance window. Test, deploy, verify,
For Hermes Agent specifically: subscribe to the GitHub release feed and the security advisory feed. Watch for releases tagged “security” — these mean act fast.
8.2 The unrealistic strategy
“Auto-update everything always.” This breaks production when an update has a regression. We've watched it happen. Have a maintenance window.
8.3 The really unrealistic strategy
“I'll update when I have time.” Every CVE feed reader has this person on it. Don't be that person.
Section 9 — Incident response
Eventually something will go wrong. Have a plan.
9.1 The five things to do when you suspect compromise
1. Isolate. Pull the agent off the network. Tailscale revocation takes 5 seconds. iptables drop is faster. 2. Preserve. Take a snapshot of the host (filesystem, running processes, audit logs) before doing anything destructive. 3. Rotate. Every credential the agent had access to, even ones you're not sure about. Treat all as compromised. 4. Investigate. Audit logs first. Then process tree. Then network logs. Build a timeline of what the agent did between “normal” and “weird.” 5. Rebuild. When in doubt, nuke the host and rebuild from a trusted image. Restore data from backup. Don't try to clean a compromised host in place.
9.2 The 30-day post-incident
After the immediate response:
- Write a post-mortem. Honest. What happened, why, what's changed.
- Share within your team or — if you run something with public users
- Update your defences. Whatever path the attacker took, that path is
- Subscribe to the relevant CVE feeds if you weren't already.
Section 10 — Backup and recovery
The often-forgotten security control.
10.1 What to back up
- Agent config files
- Conversation history (if you depend on it)
- MCP server configurations
- Any local-state databases (vector DB, SQLite, etc.)
- Encrypted credential vault (yes, including the encrypted vault — you
10.2 How
- borgbackup or restic to a remote endpoint. Encrypted, deduplicated.
- Frequency: nightly for live data, weekly for full system snapshots
- Off-site is non-negotiable. A backup on the same VPS dies with
10.3 Test the restore
Once a quarter, restore your backup to a fresh host and verify the agent comes up. The number of self-hosters who have backups they've never tested is depressing. If you don't test, you don't have a backup — you have a hope.
Section 11 — The 12-step quick checklist
Every self-hosted AI agent in 2026 should have:
1. Sandbox-on by default for tool execution 2. Auth-on by default for the dashboard 3. Dashboard accessible only via Tailscale or SSH tunnel (never public) 4. Credentials stored in OS keyring or encrypted vault 5. Egress denied by default with explicit allowlist 6. Audit logging on, weekly review 7. Process supervision with auto-restart 8. System metrics collection (Netdata or equivalent) 9. Alerts for crashes, CPU saturation, suspicious egress 10. Critical security patches applied within 48 hours 11. Backups nightly, off-site, tested quarterly 12. Incident response playbook written before you need it
If you can't tick all 12 in your current setup, fix the gaps before adding features.
Section 12 — Closing notes
Self-hosted AI security in 2026 is more disciplined hygiene than exotic skill. The mistakes that matter (CVE-2026-25253, CVE-2026-25103 plaintext credentials, the chronic tendency to expose dashboards on public IPs) are mistakes we already know how to avoid in other software. The novelty of the agent makes it tempting to skip the basics. Don't.
The good news: every credible self-hosted agent in 2026 ships with better defaults than the equivalent project would have shipped 18 months ago. The post-OpenClaw-crisis ecosystem is meaningfully more secure by default. Your job as an operator is to not actively undo that.
Subscribe to [the newsletter](/newsletter) for security alerts when they happen, the [CVE tracker](/cves) for the live feed, and the [methodology page](/methodology) for our standard security audit checklist.
Related guides
- [The complete OpenClaw timeline](/guides/openclaw-complete-history)
- [OpenClaw security crisis 2026](/guides/openclaw-security-crisis-2026)
- [5 best OpenClaw alternatives](/guides/openclaw-alternatives-2026)
- [Pocket AI complete guide](/guides/pocket-ai-complete-guide)
- [Edge AI hardware buyer's guide 2026](/guides/edge-ai-hardware-2026)