Sandboxing is a great security step for agents. Just like using guardrails is a great security step. I can't help but feel like it's all soft defense though. The real danger comes from the agent being able to read 3rd party data, be prompt injected, and then change or exfiltrate sensitive data. A sandbox does not prevent an email-reading agent from reading a malicious email, being prompt injected, and then sending an email to a malicious email address with the contents of your inbox. It does help in implementing network-layer controls though, like apply a policy that says this linux-based sandbox is only allowed to visit [whitelisted] urls. This kind of architectural whitelisting is the only hard defense we have for agents at the moment. Unfortunately it will also hamper their utility if used to the greatest extent possible.
Agreed, sandboxing by itself doesn't solve prompt injection. If the agent can read and send emails, no sandbox can tell a legit send from an exfiltration.
matchlock does have the network-layer controls you mentioned, such as domain whitelisting and secret protection toward designated hosts, so a rogue agent can't just POST your API key to some random endpoints.
The unsafe tool call/HTTP request problem probably needs to be solved at a different layer, possibly through the network interception layer of matchlock or an entirely different software.
sandboxing is really the only way to make agentic workflows auditable for enterprise risk. we can't underwrite trust in the model's output, but we can underwrite the isolation layer. if you can prove the agent literally cannot access the host network or sensitive volumes regardless of its instructions, that's a much cleaner compliance story than just relying on system prompts.
exactly, egress control is the second half of that puzzle. A perfect sandbox is useless for dlp if the agent can just hallucinate your private keys or pii into a response and beam it back to the model provider. it’s basically an exfiltration risk that traditional infra-level security isn't fully built to catch yet.
What are the advantages of using this over lxd system container or if we want VM isolation them lxd VMs? Is it the developer experience or there are any agent specific experience which is the key thing here?
1. Containers aren't a security boundary. Yes they can be used as such, but there is too much overhead (privilege vs unprivileged, figuring out granular capabilities, mount permissions, SELinux/AppArmor/Seccomp, gVisor) and the whole thing is just too brittle.
2. lxd VMs are QEMU-based and very heavy. Great when you need full desktop virtualization, but not for this use case. They also don't work on macOS.
Using Apple virtualization framework (which natively supports lightweight containers) on macOS and a more barebones virtualization stack like Firecracker on Linux is really the sweet spot. You get boot times in milliseconds and the full security of a VM.
The main thing matchlock adds over general-purpose vm/container tooling is agent specific network and filesystem (wip) controls, so if an agent goes rogue it can't exfiltrate your API keys, and damage largely mitigated. You'd have to build all of that yourself on top of LXD (possibly similar to matchlock).
There's also the DX side - OCI image support, highly programmable, fuse for workspace sharing. It runs on both linux and mac with a unified interface, so you get the same/similar experience locally on a Mac as you do on a linux workstation.
Mostly it's built for the purpose of "running `claude --dangerously-skip-permissions` safely" use case rather than being a general hypervisor.
I've been happily using a container to run my agents [1]. I tried to make it evolve with more advanced features, but it quickly became harder to use and I went back to a basic container which I just start with a run.sh script. Is a similar simple use possible with matchlock?
I use a very similar setup. I initially used nix to manage dev tools, but have since switched to mise and can't recommend it enough https://mise.jdx.dev/
containers are fine for basic isolation but the attack surface is way bigger than people think. you're still trusting the container runtime, the kernel, and the whole syscall interface. if the agent can call arbitrary syscalls inside the container, you're one kernel bug away from a breakout.
what I'm curious about with matchlock - does it use seccomp-bpf to restrict syscalls, or is it more like a minimal rootfs with carefully chosen binaries? because the landlock LSM stuff is cool but it's mainly for filesystem access control. network access, process spawning, that's where agents get dangerous.
also how do you handle the agent needing to install dependencies at runtime? like if claude decides it needs to pip install something mid-task. do you pre-populate the sandbox or allow package manager access?
Creator of matchlock here. Great questions, here's how matchlock handles these:
The guest-agent (pid-1) spawns commands in a new pid + mount namespace (similar to firecracker jailer but in the inner level for the purpose of macos support). In non-privileged mode it drops SYS_PTRACE, SYS_ADMIN, etes from the bounding set, sets `no_new_privs`, then installs a seccomp-BPF filter that eperms proces vm readv/writev, ptrace kernel load. The microVM is the real isolation boundary — seccomp is defense in depth. That said there is a `--privileged` flag that allows that to be skipped for the purpose of image build using buildkit.
Whether pip install works is entirely up to the OCI image you pick. If it has a package manager and you've allowed network access, go for it. The whole point is making `claude --dangerously-skip-permissions` style usage safe.
Personally I've had agents perform red team type of breakout. From my first hand experience what the agent (opus 4.6 with max thinking) will exploit without cap drops and seccomps is genuinely wild.
Thank you for matchlock! I’ve got Opus 4.6 red teaming it right now. ;)
I think a secure VM is a necessary baseline, and the days of env files with a big bundle of unscoped secrets are a thing of the past, so I like the base features you built in.
I’d love to hear more about the red team breakouts you’ve seen if you have time.
1. from isolation pov, Matchlock launch Firecracker microvm with its own kernel, so you get hardware-level isolation rather than bubblewrap's seccomp/namespace approach, therefore a sandbox escape would require a VM breakout.
2. Matchlock intercepts and controls all network traffic by default, with deny-all networking and domain allowlisting. Bubblewrap doesn't provide this, which is how exfiltration attacks like the one recently demonstrated against Claude co-work (https://www.promptarmor.com/resources/claude-cowork-exfiltra...).
3. You can use any Docker/OCI image and even build one, so the dev experience is seamless if you are using docker-container-ish dev workflow.
4. The sandboxes are programmable, as Matchlock exposes a JSON-RPC-based SDK (Go and Python) for launching and controlling VMs programmatically, which gives you finer-grained control for more complex use cases.
Creator of Matchlock here. Mostly for performance and usability. For interacting with external APIs like GCP or GitHub that generally have huge surface area, it's much more token-efficient and easier to set up if you just give the agent gcloud and gh CLI tools and the secrets to use them (in our case fake ones), compared to wiring up a full-blown MCP server. Plus, agents tend to perform better with CLI tools since they've been heavily RL'd on them.
Sometimes people are too lazy to write their own agent loop and decided to run off-the-shelf coding agent (e.g. Claude Code, or Pi in case of clawdbot) in environment.
You'd let the pro blackhat loose in your VM on your own system?
No because it's a dumb question and you don't want any stranger inside your home network regardless of firewall.
The comparison you get to make is in terms of the _extra_ security this project buys you.
Might I remind you of two things:
- You're advocating for installing random (?kernel) level software from the internet. That by itself is a real and larger treat than any potentially insecure things my `llm` user _might_ do in the future.
- User accounts security was the goto method for security for a long time. Further isolation was developed to accommodate: 'root' access for tenants, and finer resource limits controls. Neither I care to give an LLM.
So we only have build in firewall and sandbox duplication as the real feature. For the latter, my experience is that it's useless on a personal device, and slows down building or requires too much cache config. I'm not installing random crap, so i can live with the risk of lan exposure.
I'm happy with the maintenance/complexity/threat matrix of useradd.
Sandboxing is a great security step for agents. Just like using guardrails is a great security step. I can't help but feel like it's all soft defense though. The real danger comes from the agent being able to read 3rd party data, be prompt injected, and then change or exfiltrate sensitive data. A sandbox does not prevent an email-reading agent from reading a malicious email, being prompt injected, and then sending an email to a malicious email address with the contents of your inbox. It does help in implementing network-layer controls though, like apply a policy that says this linux-based sandbox is only allowed to visit [whitelisted] urls. This kind of architectural whitelisting is the only hard defense we have for agents at the moment. Unfortunately it will also hamper their utility if used to the greatest extent possible.
Creator here.
Agreed, sandboxing by itself doesn't solve prompt injection. If the agent can read and send emails, no sandbox can tell a legit send from an exfiltration.
matchlock does have the network-layer controls you mentioned, such as domain whitelisting and secret protection toward designated hosts, so a rogue agent can't just POST your API key to some random endpoints.
The unsafe tool call/HTTP request problem probably needs to be solved at a different layer, possibly through the network interception layer of matchlock or an entirely different software.
sandboxing is really the only way to make agentic workflows auditable for enterprise risk. we can't underwrite trust in the model's output, but we can underwrite the isolation layer. if you can prove the agent literally cannot access the host network or sensitive volumes regardless of its instructions, that's a much cleaner compliance story than just relying on system prompts.
This may sound obvious, but there must also be an enforcement of what's allowed into that sandbox.
I can envision perfectly secure sandboxes where people put company secrets and communicate them over to "the cloud".
exactly, egress control is the second half of that puzzle. A perfect sandbox is useless for dlp if the agent can just hallucinate your private keys or pii into a response and beam it back to the model provider. it’s basically an exfiltration risk that traditional infra-level security isn't fully built to catch yet.
Sandbox won’t be enough, distroless + “data firewall” + audit
What are the advantages of using this over lxd system container or if we want VM isolation them lxd VMs? Is it the developer experience or there are any agent specific experience which is the key thing here?
1. Containers aren't a security boundary. Yes they can be used as such, but there is too much overhead (privilege vs unprivileged, figuring out granular capabilities, mount permissions, SELinux/AppArmor/Seccomp, gVisor) and the whole thing is just too brittle.
2. lxd VMs are QEMU-based and very heavy. Great when you need full desktop virtualization, but not for this use case. They also don't work on macOS.
Using Apple virtualization framework (which natively supports lightweight containers) on macOS and a more barebones virtualization stack like Firecracker on Linux is really the sweet spot. You get boot times in milliseconds and the full security of a VM.
The main thing matchlock adds over general-purpose vm/container tooling is agent specific network and filesystem (wip) controls, so if an agent goes rogue it can't exfiltrate your API keys, and damage largely mitigated. You'd have to build all of that yourself on top of LXD (possibly similar to matchlock).
There's also the DX side - OCI image support, highly programmable, fuse for workspace sharing. It runs on both linux and mac with a unified interface, so you get the same/similar experience locally on a Mac as you do on a linux workstation.
Mostly it's built for the purpose of "running `claude --dangerously-skip-permissions` safely" use case rather than being a general hypervisor.
I've been happily using a container to run my agents [1]. I tried to make it evolve with more advanced features, but it quickly became harder to use and I went back to a basic container which I just start with a run.sh script. Is a similar simple use possible with matchlock?
1:https://github.com/asfaload/agents_container
I use a very similar setup. I initially used nix to manage dev tools, but have since switched to mise and can't recommend it enough https://mise.jdx.dev/
does mise use nix underneath or did you abandon nix entirely?
Mise doesn't use nix. I think the OP is stating he replaced nix with mise.
containers are fine for basic isolation but the attack surface is way bigger than people think. you're still trusting the container runtime, the kernel, and the whole syscall interface. if the agent can call arbitrary syscalls inside the container, you're one kernel bug away from a breakout.
what I'm curious about with matchlock - does it use seccomp-bpf to restrict syscalls, or is it more like a minimal rootfs with carefully chosen binaries? because the landlock LSM stuff is cool but it's mainly for filesystem access control. network access, process spawning, that's where agents get dangerous.
also how do you handle the agent needing to install dependencies at runtime? like if claude decides it needs to pip install something mid-task. do you pre-populate the sandbox or allow package manager access?
Creator of matchlock here. Great questions, here's how matchlock handles these:
The guest-agent (pid-1) spawns commands in a new pid + mount namespace (similar to firecracker jailer but in the inner level for the purpose of macos support). In non-privileged mode it drops SYS_PTRACE, SYS_ADMIN, etes from the bounding set, sets `no_new_privs`, then installs a seccomp-BPF filter that eperms proces vm readv/writev, ptrace kernel load. The microVM is the real isolation boundary — seccomp is defense in depth. That said there is a `--privileged` flag that allows that to be skipped for the purpose of image build using buildkit.
Whether pip install works is entirely up to the OCI image you pick. If it has a package manager and you've allowed network access, go for it. The whole point is making `claude --dangerously-skip-permissions` style usage safe.
Personally I've had agents perform red team type of breakout. From my first hand experience what the agent (opus 4.6 with max thinking) will exploit without cap drops and seccomps is genuinely wild.
Thank you for matchlock! I’ve got Opus 4.6 red teaming it right now. ;)
I think a secure VM is a necessary baseline, and the days of env files with a big bundle of unscoped secrets are a thing of the past, so I like the base features you built in.
I’d love to hear more about the red team breakouts you’ve seen if you have time.
just from looking at it
on Linux it runs Firecracker: https://github.com/jingkaihe/matchlock/blob/main/pkg/vm/linu...
on macOS uses the Apple's Virtualization.Framework Go wrapper: https://github.com/jingkaihe/matchlock/blob/main/pkg/vm/darw...
If I'm already on Linux, how does it compare to using bubblewrap?
Creator here. A few key differences:
1. from isolation pov, Matchlock launch Firecracker microvm with its own kernel, so you get hardware-level isolation rather than bubblewrap's seccomp/namespace approach, therefore a sandbox escape would require a VM breakout.
2. Matchlock intercepts and controls all network traffic by default, with deny-all networking and domain allowlisting. Bubblewrap doesn't provide this, which is how exfiltration attacks like the one recently demonstrated against Claude co-work (https://www.promptarmor.com/resources/claude-cowork-exfiltra...).
3. You can use any Docker/OCI image and even build one, so the dev experience is seamless if you are using docker-container-ish dev workflow.
4. The sandboxes are programmable, as Matchlock exposes a JSON-RPC-based SDK (Go and Python) for launching and controlling VMs programmatically, which gives you finer-grained control for more complex use cases.
Why would secrets ever need to be available to the agent directly rather than hidden inside the tool calling framework?
Creator of Matchlock here. Mostly for performance and usability. For interacting with external APIs like GCP or GitHub that generally have huge surface area, it's much more token-efficient and easier to set up if you just give the agent gcloud and gh CLI tools and the secrets to use them (in our case fake ones), compared to wiring up a full-blown MCP server. Plus, agents tend to perform better with CLI tools since they've been heavily RL'd on them.
Token efficiency is a good argument actually.
Sometimes people are too lazy to write their own agent loop and decided to run off-the-shelf coding agent (e.g. Claude Code, or Pi in case of clawdbot) in environment.
Exactly.
very cool, if you want cross-platform microvms, there's an interesting project called libkrun that powers projects like Podman and Colima.
here's a Go binding: https://github.com/mishushakov/libkrun-go
demo (on Mac): https://x.com/mishushakov/status/2020236380572643720
Have I told you about our lord and savior: `useradd`
Would you let a pro blackhat loose on your system with just a different user account?
You'd let the pro blackhat loose in your VM on your own system?
No because it's a dumb question and you don't want any stranger inside your home network regardless of firewall.
The comparison you get to make is in terms of the _extra_ security this project buys you.
Might I remind you of two things:
- You're advocating for installing random (?kernel) level software from the internet. That by itself is a real and larger treat than any potentially insecure things my `llm` user _might_ do in the future.
- User accounts security was the goto method for security for a long time. Further isolation was developed to accommodate: 'root' access for tenants, and finer resource limits controls. Neither I care to give an LLM.
So we only have build in firewall and sandbox duplication as the real feature. For the latter, my experience is that it's useless on a personal device, and slows down building or requires too much cache config. I'm not installing random crap, so i can live with the risk of lan exposure.
I'm happy with the maintenance/complexity/threat matrix of useradd.