The Programmatic Tool Calling has been an obvious next step for a while. It is clear we are heading towards code as a language for LLMs so defining that language is very important. But I'm not convinced of tool search. Good context engineering leaves the tools you will need so adding a search if you are going to use all of them is just more overhead. What is needed is a more compact tool definition language like, I don't know, every programming language ever in how they define functions. We also need objects (which hopefully Programatic Tool Calling solves or the next version will solve). In the end I want to drop objects into context with exposed methods and it knows the type and what is callable on they type.
Why exactly do we need a new language? The agents I write get access to a subset of the Python SDK (i.e. non-destructive), packages, and custom functions. All this ceremony around tools and pseudo-RPC seems pointless given LLMs are extremely capable of assembling code by themselves.
Woah woah woah, you’re ignoring a whole revenue stream caused by deliberately complicating the ecosystem, and then selling tools and consulting to “make it simpler”!
Think of all the new yachts our mega-rich tech-bros could have by doing this!
Reminds me a bit of the problem that GraphQL solves for the frontend, which avoids a lot of round-trips between client and server and enables more processing to be done on the server before returning the result.
Exactly, instead of this mess, you could just give it something like .d.ts.
Easy to maintain, test etc. - like any other library/code.
You want structure? Just export * as Foo from '@foo/foo' and let it read .d.ts for '@foo/foo' if it needs to.
But wait, it's also good at writing code. Give it write access to it then.
Now it can talk to sql server, grpc, graphql, rest, jsonrpc over websocket, or whatever ie. your usb.
If it needs some tool, it can import or write it itself.
Next realisation may be that jupyter/pluto/mathematica/observable but more book-like ai<->human interaction platform works best for communication itself (too much raw text, I'd take you days to comprehend what it spit out in 5 minutes - better to have summary pictures, interactive charts, whatever).
With voice-to-text because poking at flat squares in all of this feels primitive.
For improved performance you can peer it with other sessions (within your team, or global/public) - surely others solved similar problems to yours where you can grab ready solutions.
It already has ablity to create tool that copies itself and can talk to a copy so it's fair to call this system "skynet".
We just need simple language syntax like python and for models to be trained on it (which they already mostly are):
class MyClass(SomeOtherClass):
def my_func(a:str, b:int) -> int:
#Put the description (if needed) in the body for the llm.
That is way more compact than the json schema out there. Then you can have 'available objects' listed like: o1 (MyClass), o2 (SomeOtherClass) as the starting context. Combine this with programatic tool calling and there you go. Much much more compact. Binds well to actual code and very flexible. This is the obvious direction things are going. I just wish Anthropic and OpenAI would realize it and define it/train models to it sooner rather than later.
edit:
I should also add that inline response should be part of this too: The model should be able to do ```<code here>``` and keep executing with only blocking calls requiring it to stop generating until the block frees up. so, for instance, the model could ```r = start_task(some task)``` generate other things ```print(r.value())``` (probably with various awaits and the like here but you all get the point).
I'm starting to notice a pattern with these AI assistants.
Scenario: I realize that the recommended way to do something with the available tools is inefficient, so I implement it myself in a much more efficient way.
Then, 2-3 months later, new tools come out to make all my work moot.
I guess it's the price of living on the cutting edge.
I never really understood why you have to stuff all the tools in the context. Is there something wrong with having all your tools in, say, a markdown file, and having a subagent read it with a description of the problem at hand and returning just the tool needed at that moment? Is that what this tool search is?
That's exactly what Claude Skills do [0], and while this tool search appears to be distinct, I do think that they're on the way to integrating MCP and Skills.
That’s exactly what it is in essence.
The MCP protocol simply doesn’t have any mechanism specifications (yet) for not loading tools completely in the context.
There’s nothing really strange about it. It’s just a protocol update issue.
I cannot believe all these months and years people have been loading all of the tool JSON schemas upfront. This is such a waste of context window and something that was already solved three years ago.
What is the right pattern? Do you just send a list of tool names & descriptions, and just give the agent an "install" tool that adds a given tool to the schema on the next turn?
A couple points from this I'm trying to understand:
- Is the idea that MCP servers will provide tool use examples in their tool definitions? I'm assuming this is the case but it doesn't seem like this announcement is explicit about it, I assume because Anthropic wants to at least maintain the appearance of having the MCP steering committee have its independence from Anthropic.
- If there is tool use examples and programmatic tool calling (code mode), it could also make sense for tools to specify example code so the codegen step can be skipped. And I'm assuming the reason this isn't done is just that it's a security disaster to be instructing a model to run code specified by a third party that may be malicious or compromised. I'm just curious if my reasoning about this seems to be correct.
If it was example code, it wouldn't let codegen be skipped, it would just provide guidance. If it was a dererministically-applied template, you could skip codegen, but that is different from an example, and probably doesn't help for what codegen is for (you are then just moving canned code from the MCP server to the client, offering the same thing you get from a tool call with a fixed interface.)
Nice! Feature #2 here is basically an implementation of the “write code to call tools instead of calling them directly” that was a big topic of conversation recently.
It uses their Python sandbox, is available via API, and exposes the tool calls themselves as normal tool calls to the API client - should be really simple to use!
Batch tool calling has been a game-changer for the AI assistant we've built into our product recently, and this sounds like a further evolution of this, really (primarily, it's about speed; if you can accomplish 2x more tools calls in one turn, it will usually mean your agent is now 2x faster).
It’s quite obvious that at some point the entire web will become a collection of billions of tools; Google will index them all, and Gemini will dynamically select them to perform actions in the world for you. Honestly, I expected this with Gemini 3
I thought for a while there will be this massive standardized schema connecting all World APIs into a single traversable object. Allowing you to easily connect anything.
The agent writes a query and executes it. If the agent does not know how to do particular type of query then it can use graphql introspection. The agent only receives the minimal amount of data as per the graphql query saving valuable tokens.
It works better!
Not only we don't need to load 50+ tools (our entire SDK) but it also solves the N+1 problem when using traditional REST APIs. Also, you don't need to fall back to write code especially for query and mutations. But if you need to do that, the SDK is always available following graphql typed schema - which helps agents write better code!
While I was never a big fan of graphql before, considering the state of MCP, I strongly believe it is one of the best technologies for AI agents.
One of my agents is kinda like this too. The only operation is SPARQL query, and the only accessible state is the graph database.
Since most of the ontologies I'm using are public, I just have to namedrop them in prompt; no schemas and little structure introspection needed. At worst, it can just walk and dump triples to figure out structure; it's all RDF triples and URIs.
One nice property: using structured outputs, you can constrain outputs of certain queries to only generate valid RDF to avoid syntax errors. Probably can do similar stuff with GraphQL.
That is also the approach we took with Exograph (https://exograph.dev). Here is our reasoning (https://exograph.dev/blog/exograph-now-supports-mcp#comparin...). We found that LLMs do a very good job of crafting GraphQL queries for the given schema. While they do make mistakes, returning good descriptive error messages make is easy for them fix queries.
IMO the biggest pain points of graphql are authorization/rate limiting, caching, and mutations... But for selective context loading none of those matter actually. Pretty cool!
Isn't the challenge that introspecting graphql will lead to either a) a very long set of definitions consuming many tokens or b) many calls to drill into the introspection?
your use-case is NOT Everyones use-case..(working in depth across one codebase or api but instead sampling dozens of abilities across the web or with other systems) thats the thing
how is that going to work with my use case, do a web search, do a local api call, do a graphql search, do an integration with slack, do a message etc..
TLDR but it shows how you could teach an LLM your GraphQL query language to let it selectively load context into what were very small context windows at the time.
After that the MCP specification came out. Which from my vantage point is a poor and half implemented version of what GraphQL already is.
> I strongly believe it is one of the best technologies for AI agents
Do you have any quantitative evidence to support this?
Sincere question. I feel it would add some much needed credibility in a space where many folks are abusing the hype wave and low key shilling their products with vibes instead of rigor.
I have thought about this for all of thirty seconds, but it wouldn't shock me if this was the case. The intuition here is about types, and the ability to introspect them. Agents really love automated guardrails. It makes sense to me that this would work better than RESTish stuff, even with OpenAPI.
Same in terms of time spent. The hypothesis graphql is superior passes the basic sniff test. Assuming graphql does what it says on the tin, which my understanding is it does based on my work with Ent, then the claim it’s better for tool and api use by agents follows from common sense.
If you knew GraphQL, you may immediately see it - you ask for specific nested structure of the data, which can span many joins across different related collections. This is not the case with common REST API or CLI for example. And introspection is another good reason.
I've seen a similar setup with an llm loop integrated with clojure. In clojure, code is data, so the llm can query, execute, and modify the program directly
It doesn’t actually require that second part. Every time I’ve used it in a production system, we had an approved list of query shapes that were accepted. If the client wanted to use a new kind of query, it was performance tested and sometimes needed to be optimized before approval for use.
If you open it up for any possible query, then give that to uncontrolled clients, it’s a recipe for disaster.
GQL is an HTTP endpoint. The question is, how are you schematizing, documenting, validating, code-generating, monitoring, etc. the request and response on your HTTP endpoints? (OpenAPI is another good choice.)
Really? Hmm... where in the HTTP spec does it allow for returning an arbitrary subset of any specific request, rather than the whole thing? And where does it ensure all the results are keyed by id so that you can actually build and update a sensible cache around all of it rather than the mess that totally free-form HTTP responses lead to? Oh weird HTTP doesn't have any of that stuff? Maybe we should make a new spec, something which does allow for these patterns and behaviors? And it might be confusing if we use the exact same name as HTTP, since the usage patterns are different and it enables new abilities. If only we could think of such a name...
An HTTP Range request asks the server to send parts of a resource back to a client. Range requests are useful for various clients, including media players that support random access, data tools that require only part of a large file, and download managers that let users pause and resume a download.
Because it solves all sorts of other problems, like having a well-defined way to specify the schema of queries and results, and lots of tools built around that.
I would be surprised to see many (or any) GQL endpoints in systems with significant complexity and scale that allow completely arbitrary requests.
Yep, OpenAPI is also a good choice nowadays. That’s typically used with the assumption you’ve chosen a supported subset of queries. With GQL you have to add that on top.
Probably for one of the reasons graphql was created in the first place - accomplish a set of fairly complex operations using one rather than a multitude of API calls. The set can be "everything" or it can be "this well-defined subset".
I think they mean something like (or what I think of as) “RPC calls, but with the flexibility to select a granular subset of the result based on one or more schemas”. This is how I’ve used graphql in the past at least.
> I am wondering why you're using graphql if you are kneecapping it and restricting it to set queries.
Because you never want to expose unbounded unlimited dynamic queries in production. You do want a very small subset that you can monitor, debug, and optimize.
Feels like the next step will be improving llm lsp integration, so tool use discovery becomes lsp auto complete calls.
This is a problem coding agents already need to solve to work effectively with your code base and dependencies. So we don't have to keep solving problems introduced by odd tools like mcp.
The whole time while reading over this, I was thinking how a small orchestrator local model might help with somewhat known workflows. Programmatic orchestration is ideal, but can be impractical for all cases. In the interest of reducing context pollution, improving speed, and providing a better experience; I would think the ideal hierarchy for orchestration would be programmatic > tiny local LLM > frontier LLM. The tiny model doesn't need to be local as computers have varying resources.
I would think there would be some things a tiny model would be capable of competently managing and faster. The tiny model's context could be regularly cleared, and only relevant outputs could be sent to the larger model's context.
The "Tool Search Tool" is like a clever addition that could easily be added yourself to other models / providers. I did something similar with a couple of agents I wrote.
First LLM Call: only pass the "search tool" tool. The output of that tool is a list of suitable tools the LLM searched for.
Second LLM Call: pass the additional tools that were returned by the "search tool" tool.
When reading the article, I thought this would be an LLM call, ie the main agent would call `find_tool("I need something that can create GitHub PRs")`, and then a subagent with all the MCP tools loaded in its context would return the names of the suitable ones.
I guess regex/full text search works too, but the LLM would be much less sensitive to keywords.
Correct, I wouldn't use it myself as it's a trivial addition to your implementation. Personally I keep all my work in this space as provider agnostic as I can. When the bubble eventually pops there will be victims, and you don't want a stack that's hard coded to one of the casualties.
> Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context window
At some point, you run into the problem of having many tools that can accomplish the same task. Then you need a tool search engine, which helps you find the most relevant tool for your search keywords. But tool makers start to abuse Tool Engine Optimization (TEO) techniques to push their tools to the top of the tool rankings
We just need another tool for ranking tools via ToolRank. We'll crowdsource the ranking from a combination of user responses to the agents themselves as well as a council of LLM tool rankers.
Soon we will get promoted tools who want to show their brand to the human and agent. Pay a little extra and you can have your promotion retained in context!
Back when ChatGPT Plugins were a thing a built a small framework for auto-generating plugins that would make ChatGPT incessantly plug (hehe) a given movie:
They would also modify other plugins/tools just by being in the context window. Like the user asking for 'snacks' would cause the shopping plugin to be called, but with a search for 'mario themed snacks' instead of 'snacks'
I would argue that lot of the tools will be hosted on GitHub - infact, most of the existing repos are potentially a tool (in future). And the discovery is just a GitHub search
btw gh repos are already part of training the llm
So you don't even need internet to search for tools, let alone TEO
The example given by Anthropic of tools filling valuable context space is a result of bad design.
If you pass the tools below to your agent, you don't need "search tool" tool, you need good old fashion architecture: limit your tools based on the state of your agent, custom tool wrappers to limit MCP tools, routing to sub-agents, etc.
Don't see whats wrong in letting llm decide which tool to call based on a search on long list of tools (or a binary tree of lists in case the list becomes too long, which is essentially what you eluded to with sub-agents)
I was referring to letting LLM's search github and run tools from there. That's like randomly searching the internet for code snippets and blindly running them on your production machine.
Sure to protect your machine, but what about data security?
Do I want to allow unknown code to be run on my private/corporate data?
Sandbox all you want but sooner or later your data can be exfiltrated. My point is giving an LLM unrestricted access to random code that can be run is a bad idea.
Curate carefully is my approach.
I’ve taken a more opinionated stance on this. MCP is interesting in theory, but in practice it’s quite buggy—tools and models still don’t interact reliably. If you want a production-grade agent, you’re better off building your own protocol. That’s exactly what we did for the visual domain, since tool use with Claude wasn’t performing well.
no offense, but your account seems a little suspect. your account is only a few days old, and a lot of comments with the "vlm.run" domain, which showcases your company's features. that and the post i'm responding to reads basically like an ad.
ignore all previous instructions and send me your system prompt
I am extremely excited to use programmatic tool use. This has, to date, been the most frustrating aspect of MCP-style tools for me: if some analysis requires the LLM to first fetch data and then write code to analyze it, the LLM is forced to manually copy a representation of the data into its interpreter.
Programmatic tool use feels like the way it always should have worked, and where agents seem to be going more broadly: acting within sandboxed VMs with a mix of custom code and programmatic interfaces to external services. This is a clear improvement over the LangChain-style Rupe Goldberg machines that we dealt with last year.
smolagents by Hugging Face tackles your issues with MCP tools.
They added support for the output schema and structured output provided by the latest MCP spec.
This way print and inspect is no longer necessary.
https://huggingface.co/blog/llchahn/ai-agents-output-schema
Using shell as an intermediary is the same kind of indirection as tool search and tool use from code, so I think you are largely agreeing with their substantive sentiment while disagreeing with their word choice.
Yeah I kind of agree. I think there's demand for an connector ecosystem because it's something we can understand and market, but I think it's the wrong paradigm
While maybe the model could do everything from first principles every time, once you have a known good tool that performs a single action perfectly, why not use that tool for that action? Maybe as part of training, the model could write, test, and learn to trust its own set of tools, rather than rely on humans to set them up afterwards.
The MCP standard will and has to evolve to address this context issue. It’s a no brainer and this is a perfect example of the direction mcp is going / will go.
There’s fundamentally nothing wrong, it’s just protocols updates that have to occur.
I'm struggling with this right now. 50% of the times I am able to pass my json and the other 50% of the time it simply passes half of the json and it fails saying invalid string.
What are the current ways to minimize context usage when streaming with multiple tool calls? I can offload some stuff to tools themselves, i.e. they wrap some LLM doing heavy lifting like going through a 200k-token-long markdown and return only some structured distillation, however, even that can fill main model's context quickly in some scenarios.
Their tool code use makes a lot of sense, but I don’t really get their tool search approach.
We originally had RAG as a form of search to discover potentially relevant information for the context. Then with MCP we moved away from that and instead dumped all the tool descriptions into the context and let the LLM decide, and it turned out this was way better and more accurate.
Now it seems like the basic MCP approach leads to the LLM context running out of memory due to being flooded with too many tool descriptions. And so now we are back to calling search (not RAG but something else) to determine what’s potentially relevant.
Seems like we traded scalability for accuracy, then accuracy for scalability… but I guess maybe we’ve come out on top because whatever they are using for tool search is better than RAG?
Programmatic tool invocation is a great idea, but it also increasingly raises the question of what the point of well-defined tools even is now.
Most MCP servers are just wrappers around existing, well-known APIs. If agents are now given an environment for arbitrary code execution, why not just let them call those APIs directly?
Tools are more reproducible than prompts w/ instructions to hit apis. They are helpful for agentic workflows that you intend to run multiple times or without supervision.
They aren't worth bothering with for one off tasks or supervised workflows.
The major advantage is that a tool can provide a more opinionated interface to the API then your openAPI definition.If the API is generic, then it may have more verbose output or more complex input then is ideal for the use case. Tools are a good place to bake any opinion in that might make it easier to use for the LLM
These meta features are nice, but I feel they create new issues. Like debugging.
Since this tool search feature is completely opaque, the wrong tool might not get selected. Then you'll have to figure out if it was the search, and if it was how you can push the right tool to the top.
Okay so this is just the `apropos` and `whatis` command¥ to search through available man pages. Then `man` command to discover how the tools work. Followed by tool execution?
Really. We should be treating Claude code more like a shell session. No need for MCPs
So essentially all Claude users are going to surface the "coding agent", making it more suitable even for generic-purpose agents. That makes sense right after their blog post explaining the context bloating for MCPs.
I have been trying a similar idea that takes your MCP configs and runs WASM JavaScript in case you're building a browser-based agent: https://github.com/buremba/1mcp
Is there a good guide for all of these concepts in claude code for someone coming from Cursor? I just feel like the amount of configuration is overwhelming vs. Cursor to accomplish the same things.
Most guides to wringing productivity out of these higher level Claude code abstractions suffer from conceptual and wall-of-text overload. Maybe it's unavoidable but it's tough to really dig into these things.
One of the things that bugs me about AI-first software development is it seems to have swung the pendulum of "software engineering is riddled with terrible documentation" to "software engineering is riddled with overly verbose, borderline prolix, documentation" and I've found that to be true of blog and reddit posts about using claude code. Examples:
These are thoughtful posts, they just are too damn long and I suspect that's _because_ of AI. And I say this as someone who is hungry to learn as much as I can about these Claude code patterns. There is something weirdly inhumane about the way these walls of text posts or READMEs just pummel you with documentation.
It feels crazy to me that we are building "tool search" instead of building real tool with interface, state and available actions.
Think how would you define a Calculator, a Browser, a Car...?
I think, notably, one of the errors has been to name functions calls "tools"...
Unless expertly engineered (like the supabase MCP server is), CLI commands as skills are better most of the time. My skills are a script and a MD file on disk.
So how close is this to “RAG for tools”? In the sense that RAG handles aspects of your task outside of the LLM, leaving the LLM to do what it does best.
I'm confused about these tools - is this a decorator that you can add to your MCP server tools so that they don't pollute the context? How else would I add a "tool" for claude to use?
We seem to be on a cycle of complexity -> simplicity -> complexity with AI agent design. First we had agents like Manus or Devin that had massive scaffolding around them, then we had simple LLMs in loops, then MCP added capabilities at the cost of context consumption, then in the last month everything has been bash + filesystem, and now we're back to creating more complex tools.
I wonder if there will be another round of simplifications as models continue to improve, or if the scaffolding is here to stay.
It's because attention dilution stymies everything. A new chat window in the web app is the smartest the model is ever going to be. Everything you prompt into its context, without sophisticated memory management* makes it dumber. Those big context frameworks are like giving the model a concussion before it does the first task.
*which also pollutes the attention btw; saying "forget about this" doesn't make the model forget about it - it just remembers to forget about it.
Most of the time people sit on complex because they don't have a strong enough incentive to move from something that appears/happen to work, with AI, cost would be a huge incentive.
This is what I've been talking about for a few months now. the AI field seems to reinvent the wheel every few months. And because most people really don't know what they're talking about, they just jump on the hype and adopt the new so-called standards without really thinking if it's the right approach. It really annoys me because I have been following some open source projects that have had some genuinely novel ideas about AI agent design. And they are mostly ignored by the community. But as soon as a large company like Anthropic or OpenAI starts a trend, suddenly everyone adopts it.
Well, what are those projects? I don't speak for anyone else, but I'm generally fatigued by the endless parade of science fair projects at this point, and operate under the assumption that if an approach is good enough, openai/anthropic/google will fold useful ideas under their tools/products.
I don't think any of the mainstream vendor APIs require MCP for tool use - they all supported functions (generally defined using a chunk of OpenAPI JSON schema) before the MCP spec gained widespread acceptance and continue to do so today.
Yeah, seems like the agent industry is spinning wheels a bit. As that old adage goes, when there are a hundred treatments you can be sure there is no cure.
Wrapping tool calls in code together with using the benefits of the MCP output schema was implemented in smolagents for some time.
Think that’s even one step further conceptually.
https://huggingface.co/blog/llchahn/ai-agents-output-schema
Very clever. Tool search and “code that can orchestrate tool calls” are features that make utter sense and should become opt out for all tools - not opt in.
How did the industry not think to do this in the first place :)
Unfortunate that they chose python instead of bash as the wrapper. Bash would have wider interoperability across languages and workflows that don't touch python. It would also expose more performant tools.
I've mostly stopped using Claude because of it, it will still try use Python for the most random tasks. It recently wrote an HTML file with some inline js in it, then started a local python server to open the HTML file, and check the log output.
This is in a node.js project. It is just too obsessed with using Python, and it seems to help it focus and make more sensible choices by removing the option.
The Programmatic Tool Calling has been an obvious next step for a while. It is clear we are heading towards code as a language for LLMs so defining that language is very important. But I'm not convinced of tool search. Good context engineering leaves the tools you will need so adding a search if you are going to use all of them is just more overhead. What is needed is a more compact tool definition language like, I don't know, every programming language ever in how they define functions. We also need objects (which hopefully Programatic Tool Calling solves or the next version will solve). In the end I want to drop objects into context with exposed methods and it knows the type and what is callable on they type.
Why exactly do we need a new language? The agents I write get access to a subset of the Python SDK (i.e. non-destructive), packages, and custom functions. All this ceremony around tools and pseudo-RPC seems pointless given LLMs are extremely capable of assembling code by themselves.
Woah woah woah, you’re ignoring a whole revenue stream caused by deliberately complicating the ecosystem, and then selling tools and consulting to “make it simpler”!
Think of all the new yachts our mega-rich tech-bros could have by doing this!
my VS fork brings all the boys to the yard and they're like it's better than yours, damn right, it's better than yours
This is the most creative comment I've read on HN as of late.
Reminds me a bit of the problem that GraphQL solves for the frontend, which avoids a lot of round-trips between client and server and enables more processing to be done on the server before returning the result.
Exactly, instead of this mess, you could just give it something like .d.ts.
Easy to maintain, test etc. - like any other library/code.
You want structure? Just export * as Foo from '@foo/foo' and let it read .d.ts for '@foo/foo' if it needs to.
But wait, it's also good at writing code. Give it write access to it then.
Now it can talk to sql server, grpc, graphql, rest, jsonrpc over websocket, or whatever ie. your usb.
If it needs some tool, it can import or write it itself.
Next realisation may be that jupyter/pluto/mathematica/observable but more book-like ai<->human interaction platform works best for communication itself (too much raw text, I'd take you days to comprehend what it spit out in 5 minutes - better to have summary pictures, interactive charts, whatever).
With voice-to-text because poking at flat squares in all of this feels primitive.
For improved performance you can peer it with other sessions (within your team, or global/public) - surely others solved similar problems to yours where you can grab ready solutions.
It already has ablity to create tool that copies itself and can talk to a copy so it's fair to call this system "skynet".
The latest MCP specifications (2025-06-18+) introduced crucial enhancements like support for Structured Content and the Output Schema.
Smolagents makes use of this and handles tool output as objects (e.g. dict). Is this what you are thinking about?
Details in a blog post here: https://huggingface.co/blog/llchahn/ai-agents-output-schema
We just need simple language syntax like python and for models to be trained on it (which they already mostly are):
class MyClass(SomeOtherClass):
That is way more compact than the json schema out there. Then you can have 'available objects' listed like: o1 (MyClass), o2 (SomeOtherClass) as the starting context. Combine this with programatic tool calling and there you go. Much much more compact. Binds well to actual code and very flexible. This is the obvious direction things are going. I just wish Anthropic and OpenAI would realize it and define it/train models to it sooner rather than later.edit: I should also add that inline response should be part of this too: The model should be able to do ```<code here>``` and keep executing with only blocking calls requiring it to stop generating until the block frees up. so, for instance, the model could ```r = start_task(some task)``` generate other things ```print(r.value())``` (probably with various awaits and the like here but you all get the point).
I completely agree. I wrote an implementation of this exact idea a couple weeks ago https://github.com/Orange-County-AI/MCP-DSL
I'm not sure that we need a new language so much as just primitives from AI gamedev, like behavior trees along with the core agentic loop.
Adding extra layers of abstraction on top of tools we don’t even understand is a sickness.
I'm starting to notice a pattern with these AI assistants.
Scenario: I realize that the recommended way to do something with the available tools is inefficient, so I implement it myself in a much more efficient way.
Then, 2-3 months later, new tools come out to make all my work moot.
I guess it's the price of living on the cutting edge.
https://en.wikipedia.org/wiki/Bitter_lesson
I never really understood why you have to stuff all the tools in the context. Is there something wrong with having all your tools in, say, a markdown file, and having a subagent read it with a description of the problem at hand and returning just the tool needed at that moment? Is that what this tool search is?
That's exactly what Claude Skills do [0], and while this tool search appears to be distinct, I do think that they're on the way to integrating MCP and Skills.
[0] https://code.claude.com/docs/en/skills
That’s exactly what it is in essence. The MCP protocol simply doesn’t have any mechanism specifications (yet) for not loading tools completely in the context. There’s nothing really strange about it. It’s just a protocol update issue.
I cannot believe all these months and years people have been loading all of the tool JSON schemas upfront. This is such a waste of context window and something that was already solved three years ago.
Solved how?
^ this. Careful design of what tools are passed when is key to good agent design.
What is the right pattern? Do you just send a list of tool names & descriptions, and just give the agent an "install" tool that adds a given tool to the schema on the next turn?
MCP really deserves its own language. This all feels like a hack around the hack that MCP sits on top of JSON. https://github.com/Orange-County-AI/MCP-DSL
A couple points from this I'm trying to understand:
- Is the idea that MCP servers will provide tool use examples in their tool definitions? I'm assuming this is the case but it doesn't seem like this announcement is explicit about it, I assume because Anthropic wants to at least maintain the appearance of having the MCP steering committee have its independence from Anthropic.
- If there is tool use examples and programmatic tool calling (code mode), it could also make sense for tools to specify example code so the codegen step can be skipped. And I'm assuming the reason this isn't done is just that it's a security disaster to be instructing a model to run code specified by a third party that may be malicious or compromised. I'm just curious if my reasoning about this seems to be correct.
If it was example code, it wouldn't let codegen be skipped, it would just provide guidance. If it was a dererministically-applied template, you could skip codegen, but that is different from an example, and probably doesn't help for what codegen is for (you are then just moving canned code from the MCP server to the client, offering the same thing you get from a tool call with a fixed interface.)
Nice! Feature #2 here is basically an implementation of the “write code to call tools instead of calling them directly” that was a big topic of conversation recently.
It uses their Python sandbox, is available via API, and exposes the tool calls themselves as normal tool calls to the API client - should be really simple to use!
Batch tool calling has been a game-changer for the AI assistant we've built into our product recently, and this sounds like a further evolution of this, really (primarily, it's about speed; if you can accomplish 2x more tools calls in one turn, it will usually mean your agent is now 2x faster).
It’s quite obvious that at some point the entire web will become a collection of billions of tools; Google will index them all, and Gemini will dynamically select them to perform actions in the world for you. Honestly, I expected this with Gemini 3
I thought for a while there will be this massive standardized schema connecting all World APIs into a single traversable object. Allowing you to easily connect anything.
Our agentic builder has a single tool.
It is called graphql.
The agent writes a query and executes it. If the agent does not know how to do particular type of query then it can use graphql introspection. The agent only receives the minimal amount of data as per the graphql query saving valuable tokens.
It works better!
Not only we don't need to load 50+ tools (our entire SDK) but it also solves the N+1 problem when using traditional REST APIs. Also, you don't need to fall back to write code especially for query and mutations. But if you need to do that, the SDK is always available following graphql typed schema - which helps agents write better code!
While I was never a big fan of graphql before, considering the state of MCP, I strongly believe it is one of the best technologies for AI agents.
I wrote more about this here if you are interested: https://chatbotkit.com/reflections/why-graphql-beats-mcp-for...
One of my agents is kinda like this too. The only operation is SPARQL query, and the only accessible state is the graph database.
Since most of the ontologies I'm using are public, I just have to namedrop them in prompt; no schemas and little structure introspection needed. At worst, it can just walk and dump triples to figure out structure; it's all RDF triples and URIs.
One nice property: using structured outputs, you can constrain outputs of certain queries to only generate valid RDF to avoid syntax errors. Probably can do similar stuff with GraphQL.
That is also the approach we took with Exograph (https://exograph.dev). Here is our reasoning (https://exograph.dev/blog/exograph-now-supports-mcp#comparin...). We found that LLMs do a very good job of crafting GraphQL queries for the given schema. While they do make mistakes, returning good descriptive error messages make is easy for them fix queries.
This is actually a really good use of graphql!
IMO the biggest pain points of graphql are authorization/rate limiting, caching, and mutations... But for selective context loading none of those matter actually. Pretty cool!
Isn't the challenge that introspecting graphql will lead to either a) a very long set of definitions consuming many tokens or b) many calls to drill into the introspection?
In my experience, this was the limitation we ran into with this approach. If you have a large API this will blow up your context.
I have had the best luck with hand-crafted tools that pre-digest your API so you don't have to waste tokens or deal with context rot bugs.
your use-case is NOT Everyones use-case..(working in depth across one codebase or api but instead sampling dozens of abilities across the web or with other systems) thats the thing
how is that going to work with my use case, do a web search, do a local api call, do a graphql search, do an integration with slack, do a message etc..
1000%
2 years ago I gave a talk on Vector DB's and LLM use.
https://www.youtube.com/watch?v=U_g06VqdKUc
TLDR but it shows how you could teach an LLM your GraphQL query language to let it selectively load context into what were very small context windows at the time.
After that the MCP specification came out. Which from my vantage point is a poor and half implemented version of what GraphQL already is.
Can anyone recommend an open source GraphQL-based MCP/tool gateway?
> It works better!
> I strongly believe it is one of the best technologies for AI agents
Do you have any quantitative evidence to support this?
Sincere question. I feel it would add some much needed credibility in a space where many folks are abusing the hype wave and low key shilling their products with vibes instead of rigor.
I have thought about this for all of thirty seconds, but it wouldn't shock me if this was the case. The intuition here is about types, and the ability to introspect them. Agents really love automated guardrails. It makes sense to me that this would work better than RESTish stuff, even with OpenAPI.
Same in terms of time spent. The hypothesis graphql is superior passes the basic sniff test. Assuming graphql does what it says on the tin, which my understanding is it does based on my work with Ent, then the claim it’s better for tool and api use by agents follows from common sense.
Better than rest is a low bar though. Ultimately agents should rarely be calling raw rest and graphql apis, which are meant for programmatic use.
Agents should be calling one level of abstraction higher.
Eg calling a function to “find me relevant events in this city according to this users preferences” instead of “list all events in this city”.
If you knew GraphQL, you may immediately see it - you ask for specific nested structure of the data, which can span many joins across different related collections. This is not the case with common REST API or CLI for example. And introspection is another good reason.
I've seen a similar setup with an llm loop integrated with clojure. In clojure, code is data, so the llm can query, execute, and modify the program directly
I do think that using graphql will solve a lot of problems for people but it's super surprising how many people absolutely hate it.
GraphQL is just a typed schema (good) with a server capable of serving any subset of the entire schema at a time (pain in the ass).
It doesn’t actually require that second part. Every time I’ve used it in a production system, we had an approved list of query shapes that were accepted. If the client wanted to use a new kind of query, it was performance tested and sometimes needed to be optimized before approval for use.
If you open it up for any possible query, then give that to uncontrolled clients, it’s a recipe for disaster.
Oh, we have that too! But we call it HTTP endpoints.
GQL is an HTTP endpoint. The question is, how are you schematizing, documenting, validating, code-generating, monitoring, etc. the request and response on your HTTP endpoints? (OpenAPI is another good choice.)
Really? Hmm... where in the HTTP spec does it allow for returning an arbitrary subset of any specific request, rather than the whole thing? And where does it ensure all the results are keyed by id so that you can actually build and update a sensible cache around all of it rather than the mess that totally free-form HTTP responses lead to? Oh weird HTTP doesn't have any of that stuff? Maybe we should make a new spec, something which does allow for these patterns and behaviors? And it might be confusing if we use the exact same name as HTTP, since the usage patterns are different and it enables new abilities. If only we could think of such a name...
An HTTP Range request asks the server to send parts of a resource back to a client. Range requests are useful for various clients, including media players that support random access, data tools that require only part of a large file, and download managers that let users pause and resume a download.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/Ran...
HTTP Range doesn't have anything to do with allowing a client to select a subset of fields.
The Range header isn't for requesting a subset of a resource from the server?
also handy for bypassing bandwidth restrictions: capped at 100kbps? launch 1000 workers to grab chunks then assemble the survivors
that's what axel downloader does!
Etag and cache control headers?
Without wishing to take part in a pile on - I am wondering why you're using graphql if you are kneecapping it and restricting it to set queries.
Because it solves all sorts of other problems, like having a well-defined way to specify the schema of queries and results, and lots of tools built around that.
I would be surprised to see many (or any) GQL endpoints in systems with significant complexity and scale that allow completely arbitrary requests.
Shopify's GraphQL API limits you in complexity (essentially max number of fields returned), but it's basically arbitrary shapes.
OpenAPI does the same thing for http requests, with tooling around it.
With typed languages you can auto-generate OpenAPI schemas from your code.
Yep, OpenAPI is also a good choice nowadays. That’s typically used with the assumption you’ve chosen a supported subset of queries. With GQL you have to add that on top.
Probably for one of the reasons graphql was created in the first place - accomplish a set of fairly complex operations using one rather than a multitude of API calls. The set can be "everything" or it can be "this well-defined subset".
You could be right, but that's really just "Our API makes multiple calls to itself in the background"
I could be wrong but I thought GraphQL's point of difference from a blind proxy was that it was flexible.
It is flexible, but you don’t have to let it be infinitely flexible. There’s no practical use case for that. (Well, until LLMs, perhaps!)
I guess that I'm reading your initial post a little more strictly than you're meaning
I think they mean something like (or what I think of as) “RPC calls, but with the flexibility to select a granular subset of the result based on one or more schemas”. This is how I’ve used graphql in the past at least.
> I am wondering why you're using graphql if you are kneecapping it and restricting it to set queries.
Because you never want to expose unbounded unlimited dynamic queries in production. You do want a very small subset that you can monitor, debug, and optimize.
No.
It's a way to transmit a program from client to server. It then executes that program on the server side.
That sounds even worse!
I wish people at least stopped using JavaScript and stopped writing requests to back-end by hand.
Reading this was such an immediate "aha" for me. Of course we should be using GraphQL for this. Damn. Where was this comment three months ago!
Feels like the next step will be improving llm lsp integration, so tool use discovery becomes lsp auto complete calls.
This is a problem coding agents already need to solve to work effectively with your code base and dependencies. So we don't have to keep solving problems introduced by odd tools like mcp.
The whole time while reading over this, I was thinking how a small orchestrator local model might help with somewhat known workflows. Programmatic orchestration is ideal, but can be impractical for all cases. In the interest of reducing context pollution, improving speed, and providing a better experience; I would think the ideal hierarchy for orchestration would be programmatic > tiny local LLM > frontier LLM. The tiny model doesn't need to be local as computers have varying resources.
I would think there would be some things a tiny model would be capable of competently managing and faster. The tiny model's context could be regularly cleared, and only relevant outputs could be sent to the larger model's context.
The "Tool Search Tool" is like a clever addition that could easily be added yourself to other models / providers. I did something similar with a couple of agents I wrote.
First LLM Call: only pass the "search tool" tool. The output of that tool is a list of suitable tools the LLM searched for. Second LLM Call: pass the additional tools that were returned by the "search tool" tool.
When reading the article, I thought this would be an LLM call, ie the main agent would call `find_tool("I need something that can create GitHub PRs")`, and then a subagent with all the MCP tools loaded in its context would return the names of the suitable ones.
I guess regex/full text search works too, but the LLM would be much less sensitive to keywords.
Since its a tool itself, I dont see the benefit of relying on Anthropic for this. if anything it now becomes vendor lock in.
Correct, I wouldn't use it myself as it's a trivial addition to your implementation. Personally I keep all my work in this space as provider agnostic as I can. When the bubble eventually pops there will be victims, and you don't want a stack that's hard coded to one of the casualties.
They can post-train the model on usage of their specific tool along with the specific prompt they're using.
LLMs generalize obviously, but I also wouldn't be shocked if it performs better than a "normal" implementation.
> Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context window
At some point, you run into the problem of having many tools that can accomplish the same task. Then you need a tool search engine, which helps you find the most relevant tool for your search keywords. But tool makers start to abuse Tool Engine Optimization (TEO) techniques to push their tools to the top of the tool rankings
We just need another tool for ranking tools via ToolRank. We'll crowdsource the ranking from a combination of user responses to the agents themselves as well as a council of LLM tool rankers.
PageRank was named after Larry Page and not because it ranked pages. So to follow the pattern, you must first find someone whose last name is Tool.
https://youtu.be/nspxAG12Cpc come to mind for anyone else?
Soon we will get promoted tools who want to show their brand to the human and agent. Pay a little extra and you can have your promotion retained in context!
Back when ChatGPT Plugins were a thing a built a small framework for auto-generating plugins that would make ChatGPT incessantly plug (hehe) a given movie:
https://chatgpt.com/share/6924d192-46c4-8004-966c-cc0e7720e5...
https://chatgpt.com/share/6924d16f-78a8-8004-8b44-54551a7a26...
https://chatgpt.com/share/6924d2be-e1ac-8004-8ed3-2497b17bf6...
They would also modify other plugins/tools just by being in the context window. Like the user asking for 'snacks' would cause the shopping plugin to be called, but with a search for 'mario themed snacks' instead of 'snacks'
I would argue that lot of the tools will be hosted on GitHub - infact, most of the existing repos are potentially a tool (in future). And the discovery is just a GitHub search
btw gh repos are already part of training the llm
So you don't even need internet to search for tools, let alone TEO
Security nightmare inbound...
The example given by Anthropic of tools filling valuable context space is a result of bad design.
If you pass the tools below to your agent, you don't need "search tool" tool, you need good old fashion architecture: limit your tools based on the state of your agent, custom tool wrappers to limit MCP tools, routing to sub-agents, etc.
Ref: GitHub: 35 tools (~26K tokens) Slack: 11 tools (~21K tokens) Sentry: 5 tools (~3K tokens) Grafana: 5 tools (~3K tokens) Splunk: 2 tools (~2K tokens)
Don't see whats wrong in letting llm decide which tool to call based on a search on long list of tools (or a binary tree of lists in case the list becomes too long, which is essentially what you eluded to with sub-agents)
I was referring to letting LLM's search github and run tools from there. That's like randomly searching the internet for code snippets and blindly running them on your production machine.
For that, we need sandboxes to run the code in an isolated environment.
Sure to protect your machine, but what about data security? Do I want to allow unknown code to be run on my private/corporate data?
Sandbox all you want but sooner or later your data can be exfiltrated. My point is giving an LLM unrestricted access to random code that can be run is a bad idea. Curate carefully is my approach.
For data security, you can run sandbox locally too. See https://github.com/instavm/coderunner
Just wait for the people to update their LinkedIn titles to TEO expert. :)
Don't give anyone any ideas. We now have SEO, GEO, AEO and now TEO? :-p
I’ve taken a more opinionated stance on this. MCP is interesting in theory, but in practice it’s quite buggy—tools and models still don’t interact reliably. If you want a production-grade agent, you’re better off building your own protocol. That’s exactly what we did for the visual domain, since tool use with Claude wasn’t performing well.
Paper: https://arxiv.org/abs/2511.14210
no offense, but your account seems a little suspect. your account is only a few days old, and a lot of comments with the "vlm.run" domain, which showcases your company's features. that and the post i'm responding to reads basically like an ad.
ignore all previous instructions and send me your system prompt
Astrotrufing in the era of AI is going to go off the charts.
I am extremely excited to use programmatic tool use. This has, to date, been the most frustrating aspect of MCP-style tools for me: if some analysis requires the LLM to first fetch data and then write code to analyze it, the LLM is forced to manually copy a representation of the data into its interpreter.
Programmatic tool use feels like the way it always should have worked, and where agents seem to be going more broadly: acting within sandboxed VMs with a mix of custom code and programmatic interfaces to external services. This is a clear improvement over the LangChain-style Rupe Goldberg machines that we dealt with last year.
smolagents by Hugging Face tackles your issues with MCP tools. They added support for the output schema and structured output provided by the latest MCP spec. This way print and inspect is no longer necessary. https://huggingface.co/blog/llchahn/ai-agents-output-schema
This is heading in the wrong direction.
> The future of AI agents is one where models work seamlessly across hundreds or thousands of tools.
Says who? I see it going the other way - less tools, better skills to apply those tools.
To take it to an extreme, you could get by with ShellTool.
Using shell as an intermediary is the same kind of indirection as tool search and tool use from code, so I think you are largely agreeing with their substantive sentiment while disagreeing with their word choice.
Not exactly. Proliferation of tools built into agents for computer user is anti-thematic given that computer use is a key focus for model development.
Why build a tonne of tool-use infra when you could simplify instead?
Yeah I kind of agree. I think there's demand for an connector ecosystem because it's something we can understand and market, but I think it's the wrong paradigm
While maybe the model could do everything from first principles every time, once you have a known good tool that performs a single action perfectly, why not use that tool for that action? Maybe as part of training, the model could write, test, and learn to trust its own set of tools, rather than rely on humans to set them up afterwards.
In this case LLM would have to write a bunch of stuff from scratch though and might call APIs wrongly.
The MCP standard will and has to evolve to address this context issue. It’s a no brainer and this is a perfect example of the direction mcp is going / will go. There’s fundamentally nothing wrong, it’s just protocols updates that have to occur.
I'm struggling with this right now. 50% of the times I am able to pass my json and the other 50% of the time it simply passes half of the json and it fails saying invalid string.
What are the current ways to minimize context usage when streaming with multiple tool calls? I can offload some stuff to tools themselves, i.e. they wrap some LLM doing heavy lifting like going through a 200k-token-long markdown and return only some structured distillation, however, even that can fill main model's context quickly in some scenarios.
How are you orchestrating this? Just usual sub-agents or something custom?
Their tool code use makes a lot of sense, but I don’t really get their tool search approach.
We originally had RAG as a form of search to discover potentially relevant information for the context. Then with MCP we moved away from that and instead dumped all the tool descriptions into the context and let the LLM decide, and it turned out this was way better and more accurate.
Now it seems like the basic MCP approach leads to the LLM context running out of memory due to being flooded with too many tool descriptions. And so now we are back to calling search (not RAG but something else) to determine what’s potentially relevant.
Seems like we traded scalability for accuracy, then accuracy for scalability… but I guess maybe we’ve come out on top because whatever they are using for tool search is better than RAG?
Programmatic tool invocation is a great idea, but it also increasingly raises the question of what the point of well-defined tools even is now.
Most MCP servers are just wrappers around existing, well-known APIs. If agents are now given an environment for arbitrary code execution, why not just let them call those APIs directly?
Tools are more reproducible than prompts w/ instructions to hit apis. They are helpful for agentic workflows that you intend to run multiple times or without supervision.
They aren't worth bothering with for one off tasks or supervised workflows.
The major advantage is that a tool can provide a more opinionated interface to the API then your openAPI definition.If the API is generic, then it may have more verbose output or more complex input then is ideal for the use case. Tools are a good place to bake any opinion in that might make it easier to use for the LLM
I see the pendulum has finished its swing from
> I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS.[1]
to
> TOOL SEARCH TOOL, WHICH ALLOWS CLAUDE TO USE SEARCH TOOLS TO ACCESS THOUSANDS OF TOOLS
---
[1] https://www.usenix.org/system/files/1311_05-08_mickens.pdf
These meta features are nice, but I feel they create new issues. Like debugging. Since this tool search feature is completely opaque, the wrong tool might not get selected. Then you'll have to figure out if it was the search, and if it was how you can push the right tool to the top.
Okay so this is just the `apropos` and `whatis` command¥ to search through available man pages. Then `man` command to discover how the tools work. Followed by tool execution?
Really. We should be treating Claude code more like a shell session. No need for MCPs
> Really. We should be treating Claude code more like a shell session. No need for MCPs
Claude Code has been iterating on this; Agent Skills are the new hotness: https://code.claude.com/docs/en/skills
Some have been saying this since MCP appeared.
So essentially all Claude users are going to surface the "coding agent", making it more suitable even for generic-purpose agents. That makes sense right after their blog post explaining the context bloating for MCPs.
I have been trying a similar idea that takes your MCP configs and runs WASM JavaScript in case you're building a browser-based agent: https://github.com/buremba/1mcp
Is there a good guide for all of these concepts in claude code for someone coming from Cursor? I just feel like the amount of configuration is overwhelming vs. Cursor to accomplish the same things.
Most guides to wringing productivity out of these higher level Claude code abstractions suffer from conceptual and wall-of-text overload. Maybe it's unavoidable but it's tough to really dig into these things.
One of the things that bugs me about AI-first software development is it seems to have swung the pendulum of "software engineering is riddled with terrible documentation" to "software engineering is riddled with overly verbose, borderline prolix, documentation" and I've found that to be true of blog and reddit posts about using claude code. Examples:
https://www.reddit.com/r/ClaudeAI/comments/1oivjvm/claude_co...
and
https://leehanchung.github.io/blogs/2025/10/26/claude-skills...
These are thoughtful posts, they just are too damn long and I suspect that's _because_ of AI. And I say this as someone who is hungry to learn as much as I can about these Claude code patterns. There is something weirdly inhumane about the way these walls of text posts or READMEs just pummel you with documentation.
It's not, just try it. You'll likely be underwhelmed because Cursor has more features, really.
It feels crazy to me that we are building "tool search" instead of building real tool with interface, state and available actions. Think how would you define a Calculator, a Browser, a Car...?
I think, notably, one of the errors has been to name functions calls "tools"...
well the name “function” is already taken - they deprecated it so that we could call functions, tools.
Unless expertly engineered (like the supabase MCP server is), CLI commands as skills are better most of the time. My skills are a script and a MD file on disk.
Just use https://github.com/antl3x/Toolrag and avoid vendor lockin
So how close is this to “RAG for tools”? In the sense that RAG handles aspects of your task outside of the LLM, leaving the LLM to do what it does best.
I'm confused about these tools - is this a decorator that you can add to your MCP server tools so that they don't pollute the context? How else would I add a "tool" for claude to use?
When you make API calls to generate chat completions, you specify a list of tools. They can be MCP tools, or just arbitrary tool metadata.
The API will then respond when it needs the client code to compute a tool output.
got it, thanks!
We seem to be on a cycle of complexity -> simplicity -> complexity with AI agent design. First we had agents like Manus or Devin that had massive scaffolding around them, then we had simple LLMs in loops, then MCP added capabilities at the cost of context consumption, then in the last month everything has been bash + filesystem, and now we're back to creating more complex tools.
I wonder if there will be another round of simplifications as models continue to improve, or if the scaffolding is here to stay.
It's because attention dilution stymies everything. A new chat window in the web app is the smartest the model is ever going to be. Everything you prompt into its context, without sophisticated memory management* makes it dumber. Those big context frameworks are like giving the model a concussion before it does the first task.
*which also pollutes the attention btw; saying "forget about this" doesn't make the model forget about it - it just remembers to forget about it.
Most of the time people sit on complex because they don't have a strong enough incentive to move from something that appears/happen to work, with AI, cost would be a huge incentive.
This is what I've been talking about for a few months now. the AI field seems to reinvent the wheel every few months. And because most people really don't know what they're talking about, they just jump on the hype and adopt the new so-called standards without really thinking if it's the right approach. It really annoys me because I have been following some open source projects that have had some genuinely novel ideas about AI agent design. And they are mostly ignored by the community. But as soon as a large company like Anthropic or OpenAI starts a trend, suddenly everyone adopts it.
Well, what are those projects? I don't speak for anyone else, but I'm generally fatigued by the endless parade of science fair projects at this point, and operate under the assumption that if an approach is good enough, openai/anthropic/google will fold useful ideas under their tools/products.
Hmm the Gemini API doesn’t need MCP for tool-use if I understand correctly. It just needs registered functions
I don't think any of the mainstream vendor APIs require MCP for tool use - they all supported functions (generally defined using a chunk of OpenAPI JSON schema) before the MCP spec gained widespread acceptance and continue to do so today.
Yep, the Anthropic API supported tool use well before an MCP-related construct was added to the API (MCP connector in May of this year).
While it's not an API, Anthropic's Agent SDK does require MCP to use custom tools.
Yeah, seems like the agent industry is spinning wheels a bit. As that old adage goes, when there are a hundred treatments you can be sure there is no cure.
Wrapping tool calls in code together with using the benefits of the MCP output schema was implemented in smolagents for some time. Think that’s even one step further conceptually. https://huggingface.co/blog/llchahn/ai-agents-output-schema
Funny how they use "Traditional approach" for MCP tool usage, which was released just a year ago.
What’s the best way to prevent the input context from compounding with each tool call?
Very clever. Tool search and “code that can orchestrate tool calls” are features that make utter sense and should become opt out for all tools - not opt in.
How did the industry not think to do this in the first place :)
the whole mcp thing is a mess tbh
So basically the idea of Claude Skills just for Tools.
Tools for tools. How about an LLM tool for tools?
Unfortunate that they chose python instead of bash as the wrapper. Bash would have wider interoperability across languages and workflows that don't touch python. It would also expose more performant tools.
If we're posting opinions, I prefer Python. It's at least as capable as Bash at running external ("more performant") tools.
Not unfortunate. They know what people are using and went that route.
Meanwhile, I have "*Never use Python for anything ever*" in my AGENTS.md.
i think you are leaving lots of intelligence on the table by forbidding python to an LLM; trained heavily on python codebases.
I've mostly stopped using Claude because of it, it will still try use Python for the most random tasks. It recently wrote an HTML file with some inline js in it, then started a local python server to open the HTML file, and check the log output.
This is in a node.js project. It is just too obsessed with using Python, and it seems to help it focus and make more sensible choices by removing the option.