> 47 tools = 141k tokens consumed before you write a single word
This is the real problem in my opinion.
There are a ton of great sounding MCP but in practice they have too many individual tools and way too much documentation for each tool. It inflates processing time and burns tokens.
I find MCP is the opposite of the Unix design philosophy. You want fewer tools with more options surfaced via schema, shorter documentation, and you want to rely on convention as much as possible.
You don’t want a create file, write file, and update file tools, you want one write file tool with the ability to do all of those things. Instead of ls and find you want your list files tool to support regex and fuzzy matching with a metadata list.
This is based on building these things for most of this year, so it’s anecdotal and ymmv.
As an example rust-mcp-filesystem has 24 tools, many with completely overlapping functionality: `head_file`, `tail_file`, `read_file_lines`, `read_text_file` plus multi-file variants; or there's `list_directory`, `list_directory_with_sizes`, `calculate_directory_size`, `search_files`, and `directory_tree`. I think that whole server could be 4-6 mcp tools and it would accelerate things.
This is very interesting. So it's an MCP server that connects to what is effectively a sandboxed MCP "hub". This is a clever middle ground between using dozens of context-munching MCP servers and just giving the agent access to your command line.
One question: why is Deno used? I thought that it was a JavaScript runtime. Can pctx only run sandboxed JavaScript code? If so, what do you do if you need the agent to run a Python script? If not, I don't understand how using a sandboxed JavaScript runtime allows you to sandbox other things.
Deno wraps around the V8 engine to brings lots of APIs, features and native TypeScript, here I'm guessing the sandbox feature is using the ability to control what the running code has access to https://docs.deno.com/runtime/fundamentals/security/
hey! the way it works is that the llm is first given snippets in typescript that tell it how to use the various MCP tools. it then can code, in typescript, and execute all of the tool calls in the deno sandbox. so yes, it can only execute javascript, but this isn't meant to be a full arbitrary code execution env like E2B.dev is, this sandbox is only meant to be a place for MCP calls to happen.
we chose typescript because it's the most token efficient way to pass types and signatures to an LLM, with Python and Pydantic there are extra characters passed around
i love seeing experiments that make this stuff run locally… bridging an MCP client into a Deno sandbox feels like a natural step if you want the same ergonomics offline.
ALSO there's a bunch of interesting questions around security and permission models when code is pulled on demand. Running arbitrary tools in a sandbox is neat, BUT you still need to think about what those tools can access… environment variables, network, local file system… limiting that scope could make these experiments more viable for larger teams.
i'd be curious to see benchmarks for cold starts and memory usage in this model… as well as patterns for caching compiled tools so they aren't reloaded every time. Discovering tools on demand is one thing, making them feel instantaneous is another. HOWEVER it's exciting to see folks pushing on this area and sharing their work.
Oh that’s great! I have been experimenting a similar approach with WASM, I convert MCP tools into Typescript files and expose a single tool to run JS at runtime.
Congrats on launching! One immediate thought is that people will always be wary of running LLM-generated code on their machines even if it's sandboxed. Is one of the future business cases for this to host a remote execution environment that pctx can call out to rather than running the code locally?
Thank you! Looks interesting and I was thinking of something similar recently. I'm sure there are zillions of use cases for this, it'd be helpful to have a few of them explained on the front page
> pctx optimizes this communication by presenting MCP servers as code APIs
Would be nice to have examples of how this is reduced, if some information was lost in the process and what the tradeoff is
this makes sense, we should support a model where the code snippets can all be stored on the filesystem rather than in the context window from the MCP response
This is interesting. Also "Discover tools on-demand". Are there any stats or estimates how many tools an LLM / agent could handle with this approach vs. loading them all into context as MCP tools?
(shameless plug: im building an cloud based gateway where the set of servers given to an mcp client can be controlled using "profiles": https://docs.gatana.ai/profiles/)
no dependency injection at the moment... this is something we are exploring. adding dependencies would require rebuilding the execution runtime, which is something we want to open up in the framework soon
I'm asking because of multi-user scenarios where each MCP tool call requires authentication to 3rd-party APIs. Having a quick way to spin up the MCP "Server" with the correct credentials is not something I've seen a good solution to.
Similar project: https://github.com/aberemia24/code-executor-MCP
And the original Anthropic post that inspired both: https://www.anthropic.com/engineering/code-execution-with-mc...
> 47 tools = 141k tokens consumed before you write a single word
This is the real problem in my opinion.
There are a ton of great sounding MCP but in practice they have too many individual tools and way too much documentation for each tool. It inflates processing time and burns tokens.
I find MCP is the opposite of the Unix design philosophy. You want fewer tools with more options surfaced via schema, shorter documentation, and you want to rely on convention as much as possible.
You don’t want a create file, write file, and update file tools, you want one write file tool with the ability to do all of those things. Instead of ls and find you want your list files tool to support regex and fuzzy matching with a metadata list.
This is based on building these things for most of this year, so it’s anecdotal and ymmv.
As an example rust-mcp-filesystem has 24 tools, many with completely overlapping functionality: `head_file`, `tail_file`, `read_file_lines`, `read_text_file` plus multi-file variants; or there's `list_directory`, `list_directory_with_sizes`, `calculate_directory_size`, `search_files`, and `directory_tree`. I think that whole server could be 4-6 mcp tools and it would accelerate things.
This is very interesting. So it's an MCP server that connects to what is effectively a sandboxed MCP "hub". This is a clever middle ground between using dozens of context-munching MCP servers and just giving the agent access to your command line.
One question: why is Deno used? I thought that it was a JavaScript runtime. Can pctx only run sandboxed JavaScript code? If so, what do you do if you need the agent to run a Python script? If not, I don't understand how using a sandboxed JavaScript runtime allows you to sandbox other things.
Deno wraps around the V8 engine to brings lots of APIs, features and native TypeScript, here I'm guessing the sandbox feature is using the ability to control what the running code has access to https://docs.deno.com/runtime/fundamentals/security/
hey! the way it works is that the llm is first given snippets in typescript that tell it how to use the various MCP tools. it then can code, in typescript, and execute all of the tool calls in the deno sandbox. so yes, it can only execute javascript, but this isn't meant to be a full arbitrary code execution env like E2B.dev is, this sandbox is only meant to be a place for MCP calls to happen.
we chose typescript because it's the most token efficient way to pass types and signatures to an LLM, with Python and Pydantic there are extra characters passed around
i love seeing experiments that make this stuff run locally… bridging an MCP client into a Deno sandbox feels like a natural step if you want the same ergonomics offline.
ALSO there's a bunch of interesting questions around security and permission models when code is pulled on demand. Running arbitrary tools in a sandbox is neat, BUT you still need to think about what those tools can access… environment variables, network, local file system… limiting that scope could make these experiments more viable for larger teams.
i'd be curious to see benchmarks for cold starts and memory usage in this model… as well as patterns for caching compiled tools so they aren't reloaded every time. Discovering tools on demand is one thing, making them feel instantaneous is another. HOWEVER it's exciting to see folks pushing on this area and sharing their work.
thank you!
on your second point, check out how we lock down the sandbox with a custom deno runtime! https://github.com/portofcontext/pctx/tree/main/crates/pctx_...
on third, will def get some benchmarks out... we setup OTEL so we have the data
Oh that’s great! I have been experimenting a similar approach with WASM, I convert MCP tools into Typescript files and expose a single tool to run JS at runtime.
https://github.com/buremba/1mcp
nice! will follow your progress! love that this runs locally as well
Congrats on launching! One immediate thought is that people will always be wary of running LLM-generated code on their machines even if it's sandboxed. Is one of the future business cases for this to host a remote execution environment that pctx can call out to rather than running the code locally?
I don't see a reason to be nervous about running AI on a local system if it's VM encapsulated with cgroups.
yes! coming soon
Thank you! Looks interesting and I was thinking of something similar recently. I'm sure there are zillions of use cases for this, it'd be helpful to have a few of them explained on the front page
> pctx optimizes this communication by presenting MCP servers as code APIs
Would be nice to have examples of how this is reduced, if some information was lost in the process and what the tradeoff is
thank you, will get a more detailed benchmark out soon!
I'm even more excited for the sandboxes than I am for the "code mode".
Someone please build this with lightweight containers so it's not limited to JS services
Cloudflair has built python FaaS on top of their workers service, which is very similar to this Dino service. They did it using Wasm.
e2b.dev is focused on this space
File system access is a must tho, that's where half the power of coding agents come from: efficiently managing context files.
this makes sense, we should support a model where the code snippets can all be stored on the filesystem rather than in the context window from the MCP response
This is interesting. Also "Discover tools on-demand". Are there any stats or estimates how many tools an LLM / agent could handle with this approach vs. loading them all into context as MCP tools?
What i have read its in the range of 60-80.
(shameless plug: im building an cloud based gateway where the set of servers given to an mcp client can be controlled using "profiles": https://docs.gatana.ai/profiles/)
Very interesting! Does this support dynamic bindings like Cloudflare Workers or what would be the mechanism to inject dependencies?
no dependency injection at the moment... this is something we are exploring. adding dependencies would require rebuilding the execution runtime, which is something we want to open up in the framework soon
I'm asking because of multi-user scenarios where each MCP tool call requires authentication to 3rd-party APIs. Having a quick way to spin up the MCP "Server" with the correct credentials is not something I've seen a good solution to.
got it, yes so currently this is built just for one user - one set of credentials, but passing user credentials through is something we want to add.
thinking a native connection to cloud auth managers is the best way to do this (clerk, auth0, etc.)