My first question was whether I could use this for sensitive tasks, given that it's not running on our machines. And after poking around for a while, I didn't find a single mention of security anywhere (as far as I could tell!)
The only thing that I did find was zero data retention, which is mentioned as being 'on request' and only on the Enterprise plan.
I totally understand that you guys need to train and advance your model, but with suggested features like scraping behind login walls, it's a little hard to take seriously with neither of those two things anywhere on the site, so anything you could do to lift up those concerns would be amazing.
Again, you seem to have done some really cool stuff, so I'd love for it to be possible to use!
Update: The homepage says this in a feature box, which is... almost worst than saying nothing, because it doesn't mean anything? -> "Enterprise-grade security; End-to-end encryption, enterprise-grade standards, and zero-trust access controls keep your data protected in transit and at rest."
We take security very seriously and one of the main advantages of using Smooth over running things on your personal device is that your agent gets a browser in a sandboxed machine with no credentials or permissions by default. This means that the agent will be able to see only what you allow it to see. We also have some degree of guard-railing which we will continue to mature over time. For example, you can control which URLs the agent is allowed to view and which are off limits.
Until we'll be able to run everything locally on device, there must be a level of trust in the organizations that control the technology stack, passing from the LLM all the way to the infrastructure providers. And this applies to every personal information you disclose at any touch point to any AI company.
I believe that this trust is something that we and every other company in the space will need to fundamentally continue to grow and mature with our community and our users.
We love these tools but they were designed for testing, not for automation. They are too low-level to be used as they are by AI.
For example, the playwright MCP is very unreliable and inefficient to use. To mention a few issues, it does not correctly pierce through the different frames and does not handle the variety of edge cases that exist on the web. This means that it can't click on the button it needs to click on. Also, because it lacks control over the context design, it cannot optimize for contextual operations and your LLM trace gets polluted with incredible amount of useless tokens. This increases cost, task complexity for the LLM, and latency
On top of that, these tools rely on the accessibility tree, which is just not a viable approach for a huge number of websites
again (see other comment), you are not listening to users and asking questions, you are telling them they are wrong
You describe problems I don't have. I'm happy with Playwright and other scraping tools. Certainly not frustrated enough to pay to send my data to a 3rd party
There are pros and cons to running the browser on your own machine
For example, with remote browsers you get to give your swarm of agents unlimited and always-on browsers that they can use concurrently without being bottlenecked by your device resources
I think we tend to default to thinking in terms of one agent and one browser scenarios because we anthropomorphize them a lot, but really there is no ceiling to how parallel these workflows can become once you unlock autonomous behavior
I appreciate that, but for the audience here on HN, I’m fairly certain we understand the trade offs or potentially have more compute resources available to us than you might expect the general user to have.
Offer up the locally hosted option and it’ll be more widely adopted by those who actually want to use it as opposed to tinker.
I know this may not fit into your “product vision”, however.
Our approach is actually very cost-effective compared to alternatives. Our browser uses a token-efficient LLM-friendly representation of the webpage that keeps context size low, while also allowing small and efficient models to handle the low-level navigation. This means agents like Claude can work at a higher abstraction level rather than burning tokens on every click and scroll, which would be far more expensive
are your evals / comparisons publicly/3rd party reproducible?
If it's "trust me, I did a fair comparison", that's not going to fly today. There's too much lying in society, trusting people trying to sell you something to be telling the truth is not the default anymore, skepticism is
Frontend QA is the final frontier, good luck, you are over the target.
The amount of manual QA I am currently subjected to is simultaneously infuriating and hilarious. The foundation models are up to the task but we need new abstractions and layers to correctly fix it. This will all go the way of the dodo in 12 months but it'll be useful in the meantime.
agent-browser helped a lot over playwright but doesn't completely close the gap.
It's amazing how agents like Claude Code become very much more autonomous when they have the ability to verify their work. That's part of the reason why they work much better for unit-testable work.
I think this paradigm was very visible in yesterday's blog post from Anthropic (https://www.anthropic.com/engineering/building-c-compiler) when they mentioned that giving the agents the ability to verify against GCC was the key to unlock further progress
Giving a browser to these agents is a no brainer, especially if one works in QA or develops web-based services
Totally agree! The web for agents is evolving very fast and it's still unclear what it will look like
Our take is that, while that happens, agents today need to be able to access all the web resources that we can access as humans
Also, browsers are a really special piece of software because they provide access to almost every other kind of software. This makes them arguably the single most important tool for AI agents, and that’s why we believe that a browser might be all agents need to suddenly become ten times more useful than they already are
Thanks for asking! There are a few core differences:
1. we expose a higher level interface which allows the agent to think about what to do as opposed to what to do
2. we developed a token-efficient representation of the webpages that combines both visual and textual elements, heavily optimized for what LLMs are good at.
3. because we control the agentic loop, it also means that we can do fancy things on contextual injections, compressions, asynchronous manipulations, etc which are impossible to achieve when exposing the navigation interface
4. we use a coding agent under the hood, meaning that it can express complex actions efficiently and effectively compared to the CLI interface that agent-browser exposes
5. because we control the agent, we can use small and efficient LLMs which make the system much faster, cheaper, and more reliable
Also, our service comes with batteries included: the agent can use browsers in our cloud with auto-captcha solvers, stealth mode, we can proxy your own ip, etc
Ahah, indeed that's true... That's why we've just released Smooth CLI (https://docs.smooth.sh/cli/overview) and the SKILL.md (smooth-sdk/skills/smooth-browser/SKILL.md) associated with it. That should contain everything your agent needs to know to use Smooth. We will definitely add a LLM-friendly reference to it in the landing page and the docs introduction.
This looks really interesting!
I _would_ be curious to try it, but...
My first question was whether I could use this for sensitive tasks, given that it's not running on our machines. And after poking around for a while, I didn't find a single mention of security anywhere (as far as I could tell!)
The only thing that I did find was zero data retention, which is mentioned as being 'on request' and only on the Enterprise plan.
I totally understand that you guys need to train and advance your model, but with suggested features like scraping behind login walls, it's a little hard to take seriously with neither of those two things anywhere on the site, so anything you could do to lift up those concerns would be amazing.
Again, you seem to have done some really cool stuff, so I'd love for it to be possible to use!
Update: The homepage says this in a feature box, which is... almost worst than saying nothing, because it doesn't mean anything? -> "Enterprise-grade security; End-to-end encryption, enterprise-grade standards, and zero-trust access controls keep your data protected in transit and at rest."
Thanks for bringing this point up!
We take security very seriously and one of the main advantages of using Smooth over running things on your personal device is that your agent gets a browser in a sandboxed machine with no credentials or permissions by default. This means that the agent will be able to see only what you allow it to see. We also have some degree of guard-railing which we will continue to mature over time. For example, you can control which URLs the agent is allowed to view and which are off limits.
Until we'll be able to run everything locally on device, there must be a level of trust in the organizations that control the technology stack, passing from the LLM all the way to the infrastructure providers. And this applies to every personal information you disclose at any touch point to any AI company.
I believe that this trust is something that we and every other company in the space will need to fundamentally continue to grow and mature with our community and our users.
Curious: what are people using as the best open source and locally hosted versions to have agents browse the web?
Playwright, same thing we use when doing non-ai automation
Fun fact, ai can use the same tools you do, we don't have to reinvent everything and slap a "built for ai" label on it
We love these tools but they were designed for testing, not for automation. They are too low-level to be used as they are by AI.
For example, the playwright MCP is very unreliable and inefficient to use. To mention a few issues, it does not correctly pierce through the different frames and does not handle the variety of edge cases that exist on the web. This means that it can't click on the button it needs to click on. Also, because it lacks control over the context design, it cannot optimize for contextual operations and your LLM trace gets polluted with incredible amount of useless tokens. This increases cost, task complexity for the LLM, and latency
On top of that, these tools rely on the accessibility tree, which is just not a viable approach for a huge number of websites
again (see other comment), you are not listening to users and asking questions, you are telling them they are wrong
You describe problems I don't have. I'm happy with Playwright and other scraping tools. Certainly not frustrated enough to pay to send my data to a 3rd party
I was actually very interested until I realized that this doesn't run on my computer…
I get the sandboxing, etc, but a Docker container would achieve the same goals.
There are pros and cons to running the browser on your own machine
For example, with remote browsers you get to give your swarm of agents unlimited and always-on browsers that they can use concurrently without being bottlenecked by your device resources
I think we tend to default to thinking in terms of one agent and one browser scenarios because we anthropomorphize them a lot, but really there is no ceiling to how parallel these workflows can become once you unlock autonomous behavior
I appreciate that, but for the audience here on HN, I’m fairly certain we understand the trade offs or potentially have more compute resources available to us than you might expect the general user to have.
Offer up the locally hosted option and it’ll be more widely adopted by those who actually want to use it as opposed to tinker.
I know this may not fit into your “product vision”, however.
Way too expensive, I'll wait for a free/open source browser optimized to be used by agents.
Our approach is actually very cost-effective compared to alternatives. Our browser uses a token-efficient LLM-friendly representation of the webpage that keeps context size low, while also allowing small and efficient models to handle the low-level navigation. This means agents like Claude can work at a higher abstraction level rather than burning tokens on every click and scroll, which would be far more expensive
If a potential user says it is too expensive, better to ask why than to tell them they are wrong. You likely have assumptions you have not validated
Definitely! Making Smooth as cost-effective as possible it's been a core goal for us, so we'd really love to hear your thoughts on this
We'll continue to make Smooth more affordable and accessible as this is a core principle of our work (https://www.smooth.sh/images/comparison.gif)
are your evals / comparisons publicly/3rd party reproducible?
If it's "trust me, I did a fair comparison", that's not going to fly today. There's too much lying in society, trusting people trying to sell you something to be telling the truth is not the default anymore, skepticism is
Frontend QA is the final frontier, good luck, you are over the target.
The amount of manual QA I am currently subjected to is simultaneously infuriating and hilarious. The foundation models are up to the task but we need new abstractions and layers to correctly fix it. This will all go the way of the dodo in 12 months but it'll be useful in the meantime.
agent-browser helped a lot over playwright but doesn't completely close the gap.
It's amazing how agents like Claude Code become very much more autonomous when they have the ability to verify their work. That's part of the reason why they work much better for unit-testable work.
I think this paradigm was very visible in yesterday's blog post from Anthropic (https://www.anthropic.com/engineering/building-c-compiler) when they mentioned that giving the agents the ability to verify against GCC was the key to unlock further progress
Giving a browser to these agents is a no brainer, especially if one works in QA or develops web-based services
i can see a new token efficient mirror web possibly emerging using content type headers on the request side
forms, PRG, semantic HTML and no js needed
seems unlikely, you're asking the entire internet to update their software for dubious improvements
Totally agree! The web for agents is evolving very fast and it's still unclear what it will look like
Our take is that, while that happens, agents today need to be able to access all the web resources that we can access as humans
Also, browsers are a really special piece of software because they provide access to almost every other kind of software. This makes them arguably the single most important tool for AI agents, and that’s why we believe that a browser might be all agents need to suddenly become ten times more useful than they already are
Congrats for shipping.
How does it compare to Agent Browser by Vercel?
Thanks for asking! There are a few core differences: 1. we expose a higher level interface which allows the agent to think about what to do as opposed to what to do 2. we developed a token-efficient representation of the webpages that combines both visual and textual elements, heavily optimized for what LLMs are good at. 3. because we control the agentic loop, it also means that we can do fancy things on contextual injections, compressions, asynchronous manipulations, etc which are impossible to achieve when exposing the navigation interface 4. we use a coding agent under the hood, meaning that it can express complex actions efficiently and effectively compared to the CLI interface that agent-browser exposes 5. because we control the agent, we can use small and efficient LLMs which make the system much faster, cheaper, and more reliable
Also, our service comes with batteries included: the agent can use browsers in our cloud with auto-captcha solvers, stealth mode, we can proxy your own ip, etc
Ironically, the landing page and docs pages of Smooth aren't all that token-efficient!
Ahah, indeed that's true... That's why we've just released Smooth CLI (https://docs.smooth.sh/cli/overview) and the SKILL.md (smooth-sdk/skills/smooth-browser/SKILL.md) associated with it. That should contain everything your agent needs to know to use Smooth. We will definitely add a LLM-friendly reference to it in the landing page and the docs introduction.