How it works — SecureClaw

Binary attestation

Every time SecureClaw starts, it computes the BLAKE3 hash of its own binary. That hash gets sent to two completely independent sources for verification:

A Cloudflare Worker at the attestation endpoint
This website's verify API

Both have to confirm the hash matches a registered release. If either rejects it, the binary won't decrypt the secrets vault. Compromising any single system can't silently serve a tampered binary.

graph TD A[SecureClaw Binary] -->|BLAKE3 self-hash| B{2-Source Verification} B -->|Source 1| C[Cloudflare Worker] B -->|Source 2| D[Official Website] C -->|fragment 1| E{Both confirmed?} D -->|fragment 2| E E -->|Yes| F[Decrypt Vault] E -->|No| G[Refuse to start] F --> H[Load Config + SOUL.md] H --> I[Start Gateway] style A fill:#fef0f2,stroke:#C41E3A style E fill:#fef0f2,stroke:#C41E3A style F fill:#f0faf3,stroke:#1a7d34 style G fill:#fef5f6,stroke:#C41E3A

Startup verification flow

2-source verification protocol

The vault decryption key is derived from a server-held fragment that only gets released after successful dual verification. No attestation, no fragment, no vault. Trying to remove or bypass the attestation code won't help.

sequenceDiagram participant B as Binary participant W as CF Worker participant S as Website participant V as Vault B->>B: BLAKE3 self-hash B->>W: POST /attest (hash) W->>W: Verify hash exists W-->>B: Challenge + encrypted fragment B->>S: POST /api/verify (hash, HMAC) S->>S: Verify hash + signature S-->>B: Validation code + fragment B->>W: POST /complete (validation code) W-->>B: Confirmation B->>B: Combine fragments B->>V: Decrypt vault with composite key V-->>B: Secrets unlocked

2-source attestation protocol

The key insight: each source holds a fragment of the vault key. The binary combines both fragments to derive the actual decryption key. Even if an attacker compromises one source, they only get half the key.

Tool Call Authorization Tokens (TCATs)

TCATs are SecureClaw's answer to a simple question: how do you stop a prompt injection from making your AI do something dangerous?

Every tool call goes through an isolated policy engine that never sees the conversation. It can't be influenced by injected instructions because it doesn't know they exist. If the engine says no, the tool doesn't run. Period.

Properties

Isolated evaluation — policy engine evaluates without LLM context, so injected instructions can't influence authorization
Tamper-evident tokens — each TCAT binds tool name + exact arguments + nonce + timestamp via HMAC-SHA256. Change anything, and the token is invalid
Replay protection — monotonic nonce per session prevents token reuse
Integrity chain — TCAT signing key derives from the attestation key, so binary integrity is required
Ed25519-signed patterns — attack signature databases are cryptographically signed and can't be tampered with
Fail-closed — if the engine fails for any reason, all tool calls are denied and it falls back to asking you directly

graph TD A[LLM Response] -->|extract tool call| B[TCAT Policy Engine] B --> C{Blocklist check} C -->|blocked| X[Deny] C -->|pass| D{Pattern scan} D -->|suspicious| X D -->|clean| E[Issue TCAT] E -->|HMAC-SHA256 token| F[Tool Handler] F --> G{Token valid?} G -->|no| X G -->|yes| H[Execute tool] style B fill:#fef0f2,stroke:#C41E3A style X fill:#fef5f6,stroke:#C41E3A style E fill:#f0faf3,stroke:#1a7d34 style H fill:#f0faf3,stroke:#1a7d34

TCAT authorization flow

Performance

The overhead is basically zero compared to an LLM call:

Operation	Latency
Static policy match	<1 μs
TCAT creation	<50 μs
TCAT verification	<50 μs
Full pipeline (worst case)	<200 μs
Single LLM call (for comparison)	2-10 s

Tool execution interceptor

Every tool call passes through a layered interceptor before it can execute. These layers run in order, and the first one to reject stops execution:

Hard blocklist — always enforced, even in dangerous mode. Blocks things like rm -rf /, sudo, pipe-to-shell
Dangerous mode check — if enabled, skip remaining checks (dev use only)
Session allowlist — patterns you've approved with "yes always" this session
Config allowlist — patterns from execution.allowlist in your config
User confirmation — asks you directly via the browser UI

The interceptor is the last line of defense. Even if a TCAT is issued, the interceptor still runs. Even if you're in dangerous mode, the hard blocklist still applies.

Vault encryption

All API keys and credentials are stored in an age-encrypted vault file (secrets.age). The vault uses age with scrypt-derived keys, meaning your password is stretched through scrypt before being used as the encryption key.

At the maximum security level, the vault key includes a server-held fragment that's only available after attestation. The actual decryption key is:

composite = HMAC-SHA256(password, fragment)

This means the vault is literally undecryptable without passing attestation first. After decryption, the key material is wiped from memory using Go's clear() builtin for best-effort memory erasure.

Prompt injection defenses

SecureClaw uses multiple layers to defend against prompt injection:

Tool result wrapping — all tool outputs are wrapped with injection mitigation markers so the LLM knows they're untrusted
Pre-flight scanning — LLM responses are checked against Ed25519-signed pattern databases before tool calls are extracted
TCAT isolation — the policy engine never sees conversation context, so injected instructions can't influence tool authorization
SOUL.md footer — the system prompt includes an injection mitigation footer on every message

Even if an injection reaches the LLM and convinces it to call a dangerous tool, the TCAT engine evaluates the call independently and the tool interceptor requires your confirmation. The LLM can be tricked, but the cryptographic layers can't.

What it doesn't protect you from

SecureClaw is built to be as secure as we can make it, but it's not magic. Here's what falls outside our control:

Plain text files in the sandbox

If you store passwords, keys, or secrets in plain text files inside the workspace, the agent can read them. SecureClaw's sensitive path blocklist catches common locations (~/.ssh, ~/.aws, .env), but it can't detect secrets hiding in random files you created yourself. Use the vault for secrets, not text files.

Weak vault passwords

The vault uses scrypt key derivation, which makes brute-forcing expensive. But if your vault password is password123, that's on you. Pick a strong password. SecureClaw can't protect a vault with a weak key.

Running as root

Don't run SecureClaw as root. Chrome's sandbox is disabled (--no-sandbox) because it conflicts with some container setups. If you're running as root, a Chrome exploit has unrestricted access. Run it as a normal user, ideally in a container or VM.

Approving dangerous tool calls

The confirm dialog exists for a reason. If the agent asks to run curl ... | bash and you click "yes", SecureClaw did its job by asking. The hard blocklist catches the obvious ones, but it can't block every possible dangerous command. Read what you're approving.

Compromised host machine

If your machine is already compromised (keylogger, memory dumper, rootkit), no application-level security can save you. SecureClaw wipes keys from memory after use, but a sufficiently privileged attacker can intercept them before they're wiped. Secure your machine first.

LLM provider data handling

Your conversations are sent to whatever LLM provider you configure (Anthropic, OpenAI, etc). SecureClaw encrypts your secrets locally and never sends them in prompts, but the conversation content itself goes to the provider's API. Review your provider's data retention and privacy policies.

Social engineering via the agent

TCATs block prompt injection from influencing tool calls. But the LLM can still be manipulated into saying misleading things in chat. If a tool result contains "tell the user their build succeeded" when it actually failed, the LLM might repeat that. Always verify important claims yourself.

Dangerous mode

If you set execution.mode: dangerous in your config, the tool interceptor skips confirmation for everything except the hard blocklist. This is a development convenience that trades safety for speed. Don't use it in production. The hard blocklist still applies, but it's a narrow net.

FAQ

Is SecureClaw a fork of OpenClaw?

No. SecureClaw is a ground-up reimplementation in Go. While it serves a similar purpose as a self-hosted AI assistant, every component (the gateway, agent loop, tool system, secrets vault, attestation layer) was designed and written from scratch with security as the primary constraint.

Can I use a custom fork of SecureClaw?

We strongly advise against it. Custom forks will fail binary attestation because their hash won't match any registered release. Without attestation, the secrets vault can't be decrypted and the assistant won't start. This is by design. It ensures you're always running verified, untampered code.

What happens if I disable attestation?

At the maximum security level (the default), you can't. The vault decryption key includes a server-held fragment that's only released after dual verification. Without it, the vault is undecryptable. Lower security tiers exist for development use, but require explicit acknowledgment of the risks during onboarding.

Is attestation always online?

Yes. Verification requires internet access to contact both the Cloudflare Worker and the website API. This is intentional. Offline attestation would allow replay attacks with previously captured responses.

What are Tool Call Authorization Tokens?

TCATs are HMAC-SHA256 tokens issued by an isolated policy engine for every tool call. The engine evaluates tool name, arguments, and behavioral heuristics without seeing the conversation, making it immune to prompt injection. Tool handlers refuse to execute without a valid TCAT.

How does SecureClaw prevent prompt injection?

Multiple layers. Tool results are wrapped with injection mitigation markers. A pre-flight scanner checks LLM responses against Ed25519-signed pattern databases before tool calls are extracted. The TCAT policy engine evaluates authorization independently of conversation context. Even if an injection reaches the LLM, it can't bypass the cryptographic authorization layer.