Astro - Hacker News

10 comments

rbbydotdev 2 minutes ago ago

This is interesting, anecdotally I have felt like I was having better luck with raw sqlite than using an ORM in a recent typescript project, using raw sqlite queries vs drizzle
jdlshore an hour ago ago

“Our systematic study exposes a phenomenon of constraint decay in LLM-based coding agents. While current models excel at unconstrained generation, their performance drops when forced to navigate explicit architectural rules. For end-users, this dichotomy implies that agents are reliable for rapid prototyping but remain unreliable for production-grade backend development.”
One major weakness of this study is that they didn’t fully test frontier models for cost reasons, so the specific performance results should be taken with a grain of salt. But the overall conclusion that models degrade when both behavior and architecture must be correct is interesting, and something to keep an eye on.
maxbond an hour ago ago

Reminds me of the recent paper about delegating document editing tasks to LLMs across different disciplines [1]. That paper found that programming was the only discipline most LLMs can perform long horizon tasks on without accumulating errors & corrupting the document.
I've only read the abstract of this one so far but it seems like this paper has zoomed in on programming with greater fidelity and shown a similar phenomenon. But not about long horizon tasks, more like "long style horizons" of larger sets of structural constraints.
[1] https://arxiv.org/abs/2604.15597
Discussion: https://news.ycombinator.com/item?id=48073246
[-]
- emp17344 3 minutes ago ago
  
  If it’s not easily verifiable, LLMs aren’t good at it.
p0w3n3d 9 minutes ago ago
```
   tasks spanning eight web frameworks
```
Does anyone else have this experience that LLM create better pure html+CSS+js than work with existing frameworks?
yomismoaqui 28 minutes ago ago

Also they used languages with dynamic typing like Python & JS. In my experience a statically typed codebase is easier to maintain for humans so maybe it is also for agents.
When using Codex/Claude Code with Go code I cannot count the times the agent does some change, runs a build to check for errors, find some and fix them.
gkfasdfasdf an hour ago ago

Odd they used GPT-5.2 and not GPT-5.2-codex. i.e. the one optimized for coding agent tasks.
leecommamichael 16 minutes ago ago

These things don’t think. We’re going to have to reiterate this for a long time, I fear.
[-]
- sheeshkebab 4 minutes ago ago
  
  …but they reason well enough given enough context (using their matmuls).
- emp17344 15 minutes ago ago
  
  There is now a trillion-dollar industry bent to the task of convincing people these things can think. It’s gonna cause some damage.