These plots are terrible. Why is categorical data connected across categories with lines? Why not just use bar plots?
Like in the "Web Vulns in OSS" plot, white box data for Opus 4.7 is not available, but the absurd linear interpolation across categories implies it should be near 60.
No, they are able to detect errors when pointed at them but they have a lot of false positives... making them functionally useless for a large unknown codebase. They also can't build and run an exploit post-identification. Mythos can find vulnerabilities (purportedly) and actually validate them by building and running exploits. This makes it functional and usable for hacking.
why does this read like an openai ad?
These plots are terrible. Why is categorical data connected across categories with lines? Why not just use bar plots?
Like in the "Web Vulns in OSS" plot, white box data for Opus 4.7 is not available, but the absurd linear interpolation across categories implies it should be near 60.
It's just an ad thinly disguised as useful data.
I think the x axis is meant to be time but they screwed it up.
Wasn't it already confirmed that small open-weight models were able to detect most of the same headline vulns as mythos? How is this any different?
No, they are able to detect errors when pointed at them but they have a lot of false positives... making them functionally useless for a large unknown codebase. They also can't build and run an exploit post-identification. Mythos can find vulnerabilities (purportedly) and actually validate them by building and running exploits. This makes it functional and usable for hacking.
Do you have a source for this? Not doubting it, but I would like to have something concrete the next time the Mythos horse manure is cited.