Any ideas how to solve the agent's don't have total common sense problem?
I have found when using agents to verify agents, that the agent might observe something that a human would immediately find off-putting and obviously wrong but does not raise any flags for the smart-but-dumb agent.
To clarify you are using the "fast brain, slow brain" pattern? Maybe an example would help.
Broadly speaking, we see people experiment with this architecture a lot often with a great deal of success. A few other approaches would be an agent orchestrator architecture with an intent recognition agent which routes to different sub-agents.
Obviously there are endless cases possible in production and best approach is to build your evals using that data.
congrats on the launch! do you guys have anything planned to test chat agents directly in the ui? I have an agent, but no exposed api so can't really use your product even though I have a genuine need.
Any ideas how to solve the agent's don't have total common sense problem?
I have found when using agents to verify agents, that the agent might observe something that a human would immediately find off-putting and obviously wrong but does not raise any flags for the smart-but-dumb agent.
To clarify you are using the "fast brain, slow brain" pattern? Maybe an example would help.
Broadly speaking, we see people experiment with this architecture a lot often with a great deal of success. A few other approaches would be an agent orchestrator architecture with an intent recognition agent which routes to different sub-agents.
Obviously there are endless cases possible in production and best approach is to build your evals using that data.
congrats on the launch! do you guys have anything planned to test chat agents directly in the ui? I have an agent, but no exposed api so can't really use your product even though I have a genuine need.
Yes, we do support integrations with different chat agent providers and also SMS/Whastap agents where you can just drop a number of the agent.
Let us know how your agent can be connected to and we can advise best on how to test it.
Was really fun building this - would love feedback from the HN community and get insights on your current process.