Astro - Hacker News

23 comments

himata4113 32 minutes ago ago

These numbers are seem pretty low compared to what I was able to achieve specifically around windows kernel, win32k<->win32u to be exact. It honestly wouldn't surprise me anymore if china started surpassing models that US makes public, at least in specific categories such as cyber.
GLM 5.2 is already capable enough to assist in self-training which is similar to what we saw happen with frontier models and they appear to be getting there at a significantly lower cost than openai/anthropic.
solenoid0937 an hour ago ago

GLM export controls incoming? I predict Commerce will force OpenRouter, HuggingFace to take some open models down within the next few months.
Not that it would make any sense.
[-]
- rgbrenner 24 minutes ago ago
  
  If that happens it'll be an absolute disaster. Imagine a scenario where Anthropic and OpenAI prohibit most US companies from using their latest models because of safety.. And meanwhile attackers use equivalent open source models to attack US companies.
  Any prohibition on open source models will do nothing to fix the problem.. since attackers will never feel bound to the law. All advanced models must be available for defensive purposes.
  [-]
  - andy99 10 minutes ago ago
    
    Right, but is there any evidence of intelligence behind any of these (government) decisions? It’s just regulatory capture + marketing (plus some people living out an imaginary fantasy that they’re in Neuromancer or something), absolutely no reason to think they won’t try and target open models as part of this.
- gruez an hour ago ago
  
  >GLM export controls incoming?
  US imposing export restrictions on a model from China?
  [-]
  - fph a minute ago ago
    
    [delayed]
  - mcintyre1994 29 minutes ago ago
    
    It’d be restrictions on Americans and American companies, and probably also pressure on America’s allies.
  - manquer an hour ago ago
    
    While unlikely , it is not without precedent , there are restrictions on ASML a Dutch company to sell EUV machines
    
    [-]
    
    verdverm 39 minutes ago ago
    
    ASML complies as an ally, why would China comply?
    The weights are already available and downloaded, is it going to be a crime to have them, run them, make them available? Constitutional rights still exist (I hope)
    
    [-]
    
    matheusmoreira a few seconds ago ago
    
    > it going to be a crime to have them, run them, make them available?
    Yeah. Illegal numbers.
    
    solenoid0937 30 minutes ago ago
    
    > is it going to be a crime to have them, run them, make them available?
    Now you're getting it! Commerce will call it a munition and those harboring it as harboring illegal/foreign munitions.
    No business will take the hit, so they will quickly deplatform the models.
    No end user has the GPU capacity to use GLM 5.2 or similar models at full precision so the government will call the problem "mostly solved." But they might choose to "make examples" out of a few people using p2p software to download the weights if they choose to.
    
    [-]
    
    verdverm 25 minutes ago ago
    
    Or we use the models to work on fixing vulns and stop over-blowing the doom scenarios. Gotta save the kids and kill the terrorists though!
    I'm for making software better instead of banning it based on what the rich and powerful claim.
    I suspect the real fear is that open weight models undermine the financials and token prices they thought were going to pay off their ludicrous spending because they have all raced and raised hardware prices.
    
    [-]
    
    solenoid0937 18 minutes ago ago
    
    > making software better instead of banning it
    That would be the rational thing to do.
    > financials and token prices
    I do not think the government thinks this deeply. Market manipulation might be a rational, if unethical reason to ban open source models.
    But this admin banned Anthropic models to "own the libs." They will continue to ban what they want for whatever reason they want. I don't think those reasons will be particularly coherent.
veselin an hour ago ago

Here, it appears they compare a single prompt "find IDOR", against a multi-agent system. However, one can also start far more sophisticated skills that spin up subagents and mostly do the same in Claude Code, Codex, OpenCode, Pi, etc.
Which I guess makes what semgrep sells obsolete. Unless they have built a pareto-optimal point in terms of capabilities and token usage maybe?
[-]
- blazespin an hour ago ago
  
  I think the point is less "how can we throw shade on the OP" and more "a harness can enable a lot of models to do very serious cybersec, glm 5.2 is one of them"
  [-]
  - s3p 40 minutes ago ago
    
    Are you replying to a response to the original comment? I looked but i didn't see anyone saying he's throwing shade.
admax88qqq 34 minutes ago ago

> beats Claude in our Cyber Benchmarks
Beats which model in Claude? Whenever a "benchmark" doesn't put precise model numbers in their headlines I am immediately skeptical. Either they don't know the difference (bad) or they are benchmarking against weaker models (misleading, also bad).
It's like when studies say "AI is bad at X" and they used GPT-3.5 in current year.
[-]
- InsideOutSanta 21 minutes ago ago
  
  They say "Claude Opus 4.8" in the first paragraph.
- ls612 28 minutes ago ago
  
  Opus 4.8 according to TFA. Whether or not the safety guardrails were responsible for the difference is an open question but for a dev who wants to secure their software who doesn’t work at one of the blessed Glasswing companies it doesn’t really matter why, it matters what the best tool you actually have is.
kordlessagain 3 hours ago ago

You can launch GLM-5.2 in Opencode using Nemesis8: https://github.com/DeepBlueDynamics/nemesis8#nemesis-8
After installing, do a `n8 build` to build the image, then `n8 --danger --provider opencode interactive` to launch it in a container.
Signup for GLM-5.2 here: https://z.ai
danslo an hour ago ago

It reads like an ad.
Secondly these are "just" IDORs, arguably the easiest class of vulnerabilities.
Thirdly it compares to GPT 5.5 and Opus 4.8.
No, we don't have Mythos at home.
[-]
- vlian2088 an hour ago ago
  
  >Thirdly it compares to GPT 5.5
  mythos is <10% ahead of gpt 5.5 on all benchmarks, which it gains by being several times the size of opus. had it been economical to provide, it would've been released to the public on day one instead of the marketing circus those effective altruism clowns had exhibited. admitting that it costs >1000% to run inference on a <10% better model would've been very damning.
- InsideOutSanta 40 minutes ago ago
  
  In my experience, GLM 5.2 is extremely good at finding vulnerabilities, and more importantly, unlike Opus, I've never seen it refuse a command. It genuinely is a very strong model for finding and fixing vulnerabilities.