Maybe it's marketing, but I think it's regrettable that Anthropic paired project Glasswing with Mythos. It really makes it seem like Mythos is the threat, rather than the fact that tons of vulnerabilities have always been ignored throughout the software world.
If Glasswing has been started years ago with the goal of applying fixes to AI-found gaps, then this would just be another model to add to that effort. But doing so in the ominous shadow of some new super model boosts panic IMO.
You're making a hubris-laden assumption coders know the gaps their baking into their software — that any human has a decent enough grip on the multitudes of spinning logic duct taped together to make the internet run. Most vulnerabilities aren't "ignored"; they're in a neverending backlog or unknown.
If you closed all of the AI-discovered security vulnerabilities tomorrow - by the next day there'd be a host of new ones. That's software, baby.
Cybersecurity is taken too lightly and it mostly boils down to recklessness of developers, they are just "praying" that no-one act on the issues they already know and it's something we must start talking about.
Common recklessness obviously include devs running binaries on their work machine, not using basic isolation (why?), sticky IP addresses that straight-up identify them, even worse, using same browsers to access admin panels and some random memes, obviously, hundred more like those that are ALREADY solved and KNOWN by the developers themselves. You literally have developers that still use cleartext DNS (apparently they are ok with their history accessible by random employees outsourced)
Totally agree, though I'd argue that it's still a software failure if preventing exploits requires every user memorize and follow an onerous list of best practices.
This is where security is actually heavily intertwined with Privacy, by following good privacy principles, you automatically cover a lot of security issues.
A year ago the LLM's weren't good enough to find these security issues. They could have done other stuff. But then again, the big tech companies were already doing other stuff, with bug bounties, fuzzing, rewriting key libraries, and so on.
This initiative probably could have started a few months sooner with Opus and similar models, though.
Nevertheless, the distance between free models and Mythos is not so great as claimed by the Anthropic marketing, which of course is not surprising.
In general, this is expected to be also true for other applications, because no single model is equally good for everything, even the SOTA models, trying multiple models may be necessary for obtaining the best results, but with open weights models trying many of them may add negligible cost, especially if they are hosted locally.
That's not quite true, even a year ago LLMs were finding vulnerabilities, especially when paired with an agent harness and lots of compute. And even before that security researchers have been shouting about systemic fragility.
Mythos certainly represents a big increase in exploitation capability, and we should have anticipated this coming.
A lot of those bugs were found by seasoned developers and security professionals though. Anthropic claims that Mythos is finding vulns from people who have no security background, who just typed "hey, go find a vulnerability in X", went home for the night, and came back the next morning with a PoC ready. They could definitely be an exaggerating, but if it's true that's a very different threat category which is worth paying attention to.
Previous models have done this just fine. For the last year, whenever a new model has come out I just point it at some of my repos and say something like "scan this entire codebase, look for bugs, overengineering, security flaws etc" and they always find a few useful things. Obviously each new model does this better than the last, though.
Imo that's a big deal primarily because the issue with automatically discerned vulnerabilities has long been a high volume of reports and a very bad signal-to-noise ratio. When an LLM is capable of developing PoC exploits, that means you finally have a tool that enables meaningfully triaging reports like this.
If you run Opus 4.6 and GPT 5.4 in a loop right now (maybe 100 times) against top XXXX repos, I guarantee you that you'll find at the very least, medium vulnerabilities.
> A year ago the LLM's weren't good enough to find these security issues
I know of two F100s that already started using foundation models for SCA in tandem with other products back in 2024. It's noisy, but a false positive is less harmful than an undetected true positive depending on the environment.
No, Opus has found a lot and 112 vulnerabilities were reported to Firefox alone by Opus [0]. But Mythos is uniquely capable of exploiting vulnerabilities, not just finding them.
I guess I'm not sure why you frame this as a "rather than". What Anthropic is saying is that the norm of having tons of vulnerabilities lying around historically worked OK, but Mythos shows it will soon become catastrophically not OK, and everyone who's responsible for software security needs to know this so they can take action.
I wonder whether this kind of release of model could become the spark that ignites a new digital "cold war" between us, europe, india and china, in which they will try to outwit their rivals and compromise their critical infrastructure using artificial intelligence.
Also I’d like to believe that this really is such a huge step forward compared to Opus, but lately I’ve found it hard to believe when I look at the statements made by the CEOs of AI companies and their associates, who are fuelling the hype surrounding this topic even further. Of course, it is good that large companies and industries that are crucial to the country are the first to have access to this, but until the launch takes place, I will approach this with a degree of scepticism.
Already been going on for over a decade - export controls on dual use technology like Xeon processors already began being enforced back in the Obama admin.
Tangentially related, but how does one protect themselves against the bank account/brokerage being hacked? Can you print out a proof of funds/securities owned to take to court to be made whole?
I think that would be a good precedent given the current lack of rules around AI Safety. These models don't seem to be plateauing yet and could be much more dangerous than Mythos in 1-2 years.
> A recent leak of Claude’s code prompted the startup to publish a blogpost at the beginning of the month saying that AI models had surpassed “all but the most skilled humans at finding and exploiting software vulnerabilities” [...]
I've seen a bunch of people conflate the Claude Code source-map leak with the Mythos story, though not quite as blatantly as here. I'm confident that they are totally unrelated.
The more I live the more I believe people at the top operated in some sort of cult mentality. The level of gullibleness, temporary lack of critical thinking is only matched by their sociopathy and Machiavellianism.
I'm sure it's a great big model, but the level of hype and dishonesty is something out of Sam Altman's book.
Of course it's because of the upcoming IPO, but that's the end game, for now it's critical to get those private equity guys and bank institutions to believe the gospel and hold the bag, only then the suckers from the secondary markets will be allowed to be suckers too.
> A good percentage of cybersecurity has always been theater
It is great to be in a "best-effort" business where there are no consequences for bad things happening. Cybersecurity is one of those businesses. Web search, feeds and ads are another.
Imagine you are selling locks to secure homes. A thief breaks the lock. The lock-maker is not held liable. In fact, they now start selling stronger locks, and lock sales actually improve with more thefts.
I'm definitely optimistic that the long-term trajectory is positive. All important software can undergo extensive penetration testing with cutting-edge vulnerability research techniques before launch? Sounds great. The problem is what goes wrong on the pathway to there.
There's a serious problem with being very popular/prominent/powerful and becoming surrounded by sycophants out of a sort of survival of the fittest and then developing a progressively more distorted view of reality as a result. When everything can appear to be made to work to the person at the center they start making progressively worse decisions which are consequence free because of the sway they already have. (this is a big reason why "disruptor" startups work)
Or, you're wrong. And the smartest AI Research Scientists and the top banking officials are both correctly worried about the ramifications. That's what you'd expect if there really was an issue here. Are you aware of the deep seated bugs in critical software that were already uncovered with Mythos? Are you able to steelman the issue here at all?
> Are you aware of the deep seated bugs in critical software that were already uncovered with Mythos
This. 100% this.
A large portion of the industry is under NDA right now, but most of the F500 have already already deployed or are deploying foundational models for AppSec usecases all the way back in 2023.
Sev1 vulns have already been detected with older foundation models.
Of course the noise is significant, but that's something you already faced with DAST, SAST, and other products, and is why most security teams are also pairing experienced security professionals to adjudicate and treat foundation model results as another threat intel feed.
Historically bad security that people just got by with matched with powerful tools that aren't any better than the best people, but now can be deployed by mediocre people.
Which is exactly what Anthropic understands the situation to be. They state at the beginning of the Glasswing blogpost that Mythos is not better than the best vulnerability researchers. But it doesn't have to be to become a tremendously big deal.
If it’s all marketing gimmick, then all companies that have collaborated to patch their bugs are collectively lying. If that’s the case, and they can get both OSS maintainers and the ones are on payrolls of Microsoft et al. to lie… hats off to them honestly, they deserve all the marketing exposure.
Maybe it's marketing, but I think it's regrettable that Anthropic paired project Glasswing with Mythos. It really makes it seem like Mythos is the threat, rather than the fact that tons of vulnerabilities have always been ignored throughout the software world.
If Glasswing has been started years ago with the goal of applying fixes to AI-found gaps, then this would just be another model to add to that effort. But doing so in the ominous shadow of some new super model boosts panic IMO.
You're making a hubris-laden assumption coders know the gaps their baking into their software — that any human has a decent enough grip on the multitudes of spinning logic duct taped together to make the internet run. Most vulnerabilities aren't "ignored"; they're in a neverending backlog or unknown.
If you closed all of the AI-discovered security vulnerabilities tomorrow - by the next day there'd be a host of new ones. That's software, baby.
Cybersecurity is taken too lightly and it mostly boils down to recklessness of developers, they are just "praying" that no-one act on the issues they already know and it's something we must start talking about.
Common recklessness obviously include devs running binaries on their work machine, not using basic isolation (why?), sticky IP addresses that straight-up identify them, even worse, using same browsers to access admin panels and some random memes, obviously, hundred more like those that are ALREADY solved and KNOWN by the developers themselves. You literally have developers that still use cleartext DNS (apparently they are ok with their history accessible by random employees outsourced)
Totally agree, though I'd argue that it's still a software failure if preventing exploits requires every user memorize and follow an onerous list of best practices.
This is where security is actually heavily intertwined with Privacy, by following good privacy principles, you automatically cover a lot of security issues.
A year ago the LLM's weren't good enough to find these security issues. They could have done other stuff. But then again, the big tech companies were already doing other stuff, with bug bounties, fuzzing, rewriting key libraries, and so on.
This initiative probably could have started a few months sooner with Opus and similar models, though.
Using multiple older open weights models can find all the security issues that have been found by Mythos.
However, no single model of those could find everything that was found by Mythos.
https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...
Nevertheless, the distance between free models and Mythos is not so great as claimed by the Anthropic marketing, which of course is not surprising.
In general, this is expected to be also true for other applications, because no single model is equally good for everything, even the SOTA models, trying multiple models may be necessary for obtaining the best results, but with open weights models trying many of them may add negligible cost, especially if they are hosted locally.
That's not quite true, even a year ago LLMs were finding vulnerabilities, especially when paired with an agent harness and lots of compute. And even before that security researchers have been shouting about systemic fragility.
Mythos certainly represents a big increase in exploitation capability, and we should have anticipated this coming.
A lot of those bugs were found by seasoned developers and security professionals though. Anthropic claims that Mythos is finding vulns from people who have no security background, who just typed "hey, go find a vulnerability in X", went home for the night, and came back the next morning with a PoC ready. They could definitely be an exaggerating, but if it's true that's a very different threat category which is worth paying attention to.
Previous models have done this just fine. For the last year, whenever a new model has come out I just point it at some of my repos and say something like "scan this entire codebase, look for bugs, overengineering, security flaws etc" and they always find a few useful things. Obviously each new model does this better than the last, though.
Yes, previous models found vulnerabilities but Mythos is uniquely capable of actually exploiting them: https://red.anthropic.com/2026/mythos-preview/
Imo that's a big deal primarily because the issue with automatically discerned vulnerabilities has long been a high volume of reports and a very bad signal-to-noise ratio. When an LLM is capable of developing PoC exploits, that means you finally have a tool that enables meaningfully triaging reports like this.
If you run Opus 4.6 and GPT 5.4 in a loop right now (maybe 100 times) against top XXXX repos, I guarantee you that you'll find at the very least, medium vulnerabilities.
> A year ago the LLM's weren't good enough to find these security issues
I know of two F100s that already started using foundation models for SCA in tandem with other products back in 2024. It's noisy, but a false positive is less harmful than an undetected true positive depending on the environment.
>This initiative probably could have started a few months sooner with Opus and similar models, though.
Evidently they tried and even the most recent Opus 4.6 models couldn't find much. Theres been a step change in capabilities here.
No, Opus has found a lot and 112 vulnerabilities were reported to Firefox alone by Opus [0]. But Mythos is uniquely capable of exploiting vulnerabilities, not just finding them.
[0] https://red.anthropic.com/2026/mythos-preview/
I guess I'm not sure why you frame this as a "rather than". What Anthropic is saying is that the norm of having tons of vulnerabilities lying around historically worked OK, but Mythos shows it will soon become catastrophically not OK, and everyone who's responsible for software security needs to know this so they can take action.
I wonder whether this kind of release of model could become the spark that ignites a new digital "cold war" between us, europe, india and china, in which they will try to outwit their rivals and compromise their critical infrastructure using artificial intelligence.
Also I’d like to believe that this really is such a huge step forward compared to Opus, but lately I’ve found it hard to believe when I look at the statements made by the CEOs of AI companies and their associates, who are fuelling the hype surrounding this topic even further. Of course, it is good that large companies and industries that are crucial to the country are the first to have access to this, but until the launch takes place, I will approach this with a degree of scepticism.
This invisible cyberwar is already happening; it's just that the brains powering it is getting smarter.
> ignites a new digital "cold war"
Already been going on for over a decade - export controls on dual use technology like Xeon processors already began being enforced back in the Obama admin.
> until the launch takes place
It's already launched.
Tangentially related, but how does one protect themselves against the bank account/brokerage being hacked? Can you print out a proof of funds/securities owned to take to court to be made whole?
Promoting the model as potentially dangerous might backfire with the government banning it from being released by executive order.
> the government banning it from being released by executive order.
There's no legal mechanism for the president or the government at all to do that.
I'm sure they will find something when it really starts to bother them personally.
I think that would be a good precedent given the current lack of rules around AI Safety. These models don't seem to be plateauing yet and could be much more dangerous than Mythos in 1-2 years.
> A recent leak of Claude’s code prompted the startup to publish a blogpost at the beginning of the month saying that AI models had surpassed “all but the most skilled humans at finding and exploiting software vulnerabilities” [...]
I've seen a bunch of people conflate the Claude Code source-map leak with the Mythos story, though not quite as blatantly as here. I'm confident that they are totally unrelated.
The more I live the more I believe people at the top operated in some sort of cult mentality. The level of gullibleness, temporary lack of critical thinking is only matched by their sociopathy and Machiavellianism.
I'm sure it's a great big model, but the level of hype and dishonesty is something out of Sam Altman's book.
Of course it's because of the upcoming IPO, but that's the end game, for now it's critical to get those private equity guys and bank institutions to believe the gospel and hold the bag, only then the suckers from the secondary markets will be allowed to be suckers too.
A good percentage of cybersecurity has always been theater. If their model helps to separate the wheat from the chaff, maybe it'll be an improvement.
> A good percentage of cybersecurity has always been theater
It is great to be in a "best-effort" business where there are no consequences for bad things happening. Cybersecurity is one of those businesses. Web search, feeds and ads are another.
Imagine you are selling locks to secure homes. A thief breaks the lock. The lock-maker is not held liable. In fact, they now start selling stronger locks, and lock sales actually improve with more thefts.
It sounds like it’ll just kill the wheat and the chaff.
Still probably a benefit depending on your philosophy.
I'm definitely optimistic that the long-term trajectory is positive. All important software can undergo extensive penetration testing with cutting-edge vulnerability research techniques before launch? Sounds great. The problem is what goes wrong on the pathway to there.
Need to dump the bag on retail investors and pensions before they implode
There's a serious problem with being very popular/prominent/powerful and becoming surrounded by sycophants out of a sort of survival of the fittest and then developing a progressively more distorted view of reality as a result. When everything can appear to be made to work to the person at the center they start making progressively worse decisions which are consequence free because of the sway they already have. (this is a big reason why "disruptor" startups work)
Or, you're wrong. And the smartest AI Research Scientists and the top banking officials are both correctly worried about the ramifications. That's what you'd expect if there really was an issue here. Are you aware of the deep seated bugs in critical software that were already uncovered with Mythos? Are you able to steelman the issue here at all?
> Are you aware of the deep seated bugs in critical software that were already uncovered with Mythos
This. 100% this.
A large portion of the industry is under NDA right now, but most of the F500 have already already deployed or are deploying foundational models for AppSec usecases all the way back in 2023.
Sev1 vulns have already been detected with older foundation models.
Of course the noise is significant, but that's something you already faced with DAST, SAST, and other products, and is why most security teams are also pairing experienced security professionals to adjudicate and treat foundation model results as another threat intel feed.
Two things can be true.
Historically bad security that people just got by with matched with powerful tools that aren't any better than the best people, but now can be deployed by mediocre people.
Which is exactly what Anthropic understands the situation to be. They state at the beginning of the Glasswing blogpost that Mythos is not better than the best vulnerability researchers. But it doesn't have to be to become a tremendously big deal.
Looks like the marketing worked at least somewhat lol. Such an obvious playbook by now I’m surprised some people here fell for it.
Your cynicism doesn't prove that it's fake, though.
Just like their marketing campaign doesn’t mean those claims are real?
If it’s all marketing gimmick, then all companies that have collaborated to patch their bugs are collectively lying. If that’s the case, and they can get both OSS maintainers and the ones are on payrolls of Microsoft et al. to lie… hats off to them honestly, they deserve all the marketing exposure.