OK I'll take the opportunity to be the the first non-self-promotional comment on this thread now that concensure and rohan2003 have done their ads.
Based on this post's current position on the front page it kind of seems to fall in line with a pattern we've all been seeing the past few months: HN is finally majority onboard with believing in the usefullness of coding agents and is celebrating this by rediscovering each and every personal "I improved CC by doing [blank] thing" from scratch project.
That's all whatever. Fine. But what I'm really curious about is does the HN community actually look at the random LLM-generated statistic-vomit text posted by creators like this and find themselves convinced?
I ask because if you're new to random stat vomit you're going to find yourself having to deal with it all the time soon, and I've yet to find good meta discussions about how we find signal in this noise. I used to use HN or selected reddit community upvotes as a first pass "possibly important" signal, but its been getting worse and worse, illustrated by posts like this getting upvoted to the top without any genuine discssion.
I do not understand why this project in particular have set you off.
Their README looks much better than many I've seen on HN:
- no annoying verbosity, that is so prevalent in AI-generated text
- not too many buzzwords (they're not saying "agentic" every sentence)
- it is very clear what exactly project is supposed to do and why it can be useful
Personally, I upvoted this because I wanted to do something similar for a long time but never got around to it.
- What useful information do you glean from this vova_hn2? Perhaps Im just ignorant.
“The ROI of Precision: While the "Enhanced" run used roughly 6.73% more tokens than the baseline per request, it required 27.78% fewer steps to reach a successful solution.”
So it actually takes MORE tokens but less “steps”? This could all use actual discussion feom the creator. A blog post or detailed comment. Instead we get this.
What sets me off is projects like this that throw random numbers and technical jargon at you because the user simply asked their LLM to do so. It gives the veneer of “oh it must be legitimate look at all the data” to people mentally stuck in 2024 not realizing anyone can generate junk and pass if off in a way that (used to be) convincing.
I'm not convinced that ASTs are meaningfully helpful in the grand scheme and the long-run. LSPs and code intelligence I find useful as a human and for my coding agent.
The Problem: Most RAG-based coding tools treat code as unstructured text, relying on probabilistic vector search that often misses critical functional dependencies. This leads to the "Edit-Fail-Retry" loop, where the LLM consumes more time and money through repeated failures.
The Solution: Semantic uses a local AST (Abstract Syntax Tree) parser to build a Logical Node Graph of the codebase. Instead of guessing what is relevant, it deterministically retrieves the specific functional skeletons and call-site signatures required for a task.
The Shift: From "Token Savings" to "Step Savings"
Earlier versions of this project focused on minimizing tokens per call. However, our latest benchmarks show that investing more tokens into high-precision context leads to significantly fewer developer intervention steps.
Latest A/B Benchmark (2026-03-27)
The ROI of Precision: While the "Enhanced" run used roughly 6.73% more tokens than the baseline per request, it required 27.78% fewer steps to reach a successful solution.
Deterministic Accuracy: By feeding the LLM a "Logical Skeleton" rather than fuzzy similarity-search chunks, we eliminate the "lost in the middle" effect. The agent understands the consequences of an edit before it writes a single line.
Context Density: We are effectively trading cheap input tokens for expensive developer time and agent compute cycles.
Detailed breakdowns of the task suite and methodology are available in docs/AB_TEST_DEV_RESULTS.md.
OK I'll take the opportunity to be the the first non-self-promotional comment on this thread now that concensure and rohan2003 have done their ads.
Based on this post's current position on the front page it kind of seems to fall in line with a pattern we've all been seeing the past few months: HN is finally majority onboard with believing in the usefullness of coding agents and is celebrating this by rediscovering each and every personal "I improved CC by doing [blank] thing" from scratch project.
That's all whatever. Fine. But what I'm really curious about is does the HN community actually look at the random LLM-generated statistic-vomit text posted by creators like this and find themselves convinced?
I ask because if you're new to random stat vomit you're going to find yourself having to deal with it all the time soon, and I've yet to find good meta discussions about how we find signal in this noise. I used to use HN or selected reddit community upvotes as a first pass "possibly important" signal, but its been getting worse and worse, illustrated by posts like this getting upvoted to the top without any genuine discssion.
> random LLM-generated statistic-vomit text
I do not understand why this project in particular have set you off.
Their README looks much better than many I've seen on HN:
- no annoying verbosity, that is so prevalent in AI-generated text - not too many buzzwords (they're not saying "agentic" every sentence) - it is very clear what exactly project is supposed to do and why it can be useful
Personally, I upvoted this because I wanted to do something similar for a long time but never got around to it.
Their comment reporting stats here:
“Provider: OpenAI (gpt-4o / o1)”
Uh so is it 4o or o1? Very different models. When you read this, how did you interpret this?
- OK ill take your word for it Run Variant Token Delta (per call) Step Savings (vs Baseline) Task Success Baseline (2026-03-13) -18.62% — 11/11 Hardened A +8.07% — 11/11 Enhanced (2026-03-27) -6.73% +27.78% 11/11 Key Takeaways:- What useful information do you glean from this vova_hn2? Perhaps Im just ignorant.
So it actually takes MORE tokens but less “steps”? This could all use actual discussion feom the creator. A blog post or detailed comment. Instead we get this.What sets me off is projects like this that throw random numbers and technical jargon at you because the user simply asked their LLM to do so. It gives the veneer of “oh it must be legitimate look at all the data” to people mentally stuck in 2024 not realizing anyone can generate junk and pass if off in a way that (used to be) convincing.
> Personally, I upvoted this because I wanted to do something similar for a long time but never got around to it.
It's much easier to give your agents the LSP for the language(s) it's working on.
A project that turns them into MCP: https://github.com/isaacphi/mcp-language-server
Easier, sure.
LSP has many more "functions" that are useful to agents, such as rename and get definition/usage.
Some basic differences described here: https://news.ycombinator.com/item?id=30664671
More in depth: https://lambdaland.org/posts/2026-01-21_tree-sitter_vs_lsp/
Claude Code's instructions for LSP: https://github.com/Piebald-AI/claude-code-system-prompts/blo...
I'm not convinced that ASTs are meaningfully helpful in the grand scheme and the long-run. LSPs and code intelligence I find useful as a human and for my coding agent.
The Problem: Most RAG-based coding tools treat code as unstructured text, relying on probabilistic vector search that often misses critical functional dependencies. This leads to the "Edit-Fail-Retry" loop, where the LLM consumes more time and money through repeated failures.
The Solution: Semantic uses a local AST (Abstract Syntax Tree) parser to build a Logical Node Graph of the codebase. Instead of guessing what is relevant, it deterministically retrieves the specific functional skeletons and call-site signatures required for a task. The Shift: From "Token Savings" to "Step Savings"
Earlier versions of this project focused on minimizing tokens per call. However, our latest benchmarks show that investing more tokens into high-precision context leads to significantly fewer developer intervention steps. Latest A/B Benchmark (2026-03-27)
Run Variant Token Delta (per call) Step Savings (vs Baseline) Task Success Baseline (2026-03-13) -18.62% — 11/11 Hardened A +8.07% — 11/11 Enhanced (2026-03-27) -6.73% +27.78% 11/11 Key Takeaways: Detailed breakdowns of the task suite and methodology are available in docs/AB_TEST_DEV_RESULTS.md.[dead]
[flagged]