I vibe coded for months but switched to spec driven development in the last 6 months
I'm also old enough to have started my career learning the rational unified process and then progressed through XP, agile, scrum etc
My process is I spend 2-3 hours writing a "spec" focusing on acceptance criteria and then by the end of the day I have a working, tested next version of a feature that I push to production.
I don't see how using a spec has made me less agile. My iteration takes 8 hours.
However, I see tons of useless specs. A spec is not a prompt. It's an actual definition of how to tell if something is behaving as intended or not.
People are notoriously bad at thinking about correctness in each scenario which is why vibe coding is so big.
People defer thinking about what correct and incorrect actually looks like for a whole wide scope of scenarios and instead choose to discover through trial and error.
I get 20x ROI on well defined, comprehensive, end to end acceptance tests that the AI can run. They fix everything from big picture functionality to minor logic errors.
I just tried an experiment using Spec-Kit from GitHub to build a CLI tool. Perhaps the scope of the tool doesn't align itself with Spec-Driven Development, but I found the many many hours—tweaking, asking, correcting, analyzing, adapting, refining, reshaping, etc—before getting to see any code challenging. As would be the case with Waterfall today, the lack of iterative end-to-end feedback is foreign and frustrating to me.
After Claude finally produced a significant amount of code, and after realizing it hadn't built the right thing, I was back to the drawing board to find out what language in the spec had led it astray. Never mind digging through the code at this point; it would be just as good to start again than to try to onboard myself to the 1000s of lines of code it had built... and I suppose the point is to ignore the code as "implementation detail" anyway.
Just to make clear: I love writing code with an LLM, be it for brainstorming, research, or implementation. I often write—and have it output—small markdown notes and plans for it to ground itself. I think I just found this experience with SDD quite heavy-handed and the workflow unwieldy.
I did this first too. The trick is realising that the "spec" isn't a full system spec, per se, but a detailed description of what you want to do.
System specs are non trivial for current AI agents. Hand prompting every step is time consuming.
I think (and I am still learning!) SDD sits as a fix for that. I can give it two fairly simple prompts & get a reasonably complex result. It's not a full system but it's more than I could get with two prompts previously.
The verbose "spec" stuff is just feeding the LLMs love of context, and more importantly what I think we all know is you have to tell an agent over and over how to get the right answer or it will deviate.
Early on with speckit I found I was clarifying a lot but I've discovered that was just me being not so good at writing specs!
Example prompts for speckit;
(Specify) I want to build a simple admin interface. First I want to be able to access the interface, and I want to be able to log in with my Google Workspaces account (and you should restrict logins to my workspaces domain). I will be the global superadmin, but I also want a simple RBAC where I can apply a set of roles to any user account. For simplicity let's make a record user accounts when they first log in. The first roles I want are Admin, Editor and Viewer.
(Plan) I want to implement this as a NextJS app using the latest version of Next. Please also use Mantine for styling instead of Tailwind. I want to use DynamoDB as my database for this project, so you'll also need to use Auth.js over Better Auth. It's critical that when we implement you write tests first before writing code; forget UI tests, focus on unit and integration tests. All API endpoints should have a documented contract which is tested. I also need to be able to run the dev environment locally so make sure to localise things like the database.
I think the challenge is how to create a small but evolvable spec.
What LLMs bring to the picture is that "spec" is high-level coding. In normal coding you start by writing small functions then verify that they work. Similarly LLMs should perhaps be given small specs to start with, then add more functions/features to the spec incrementally. Would that work?
Thanks! With Spec-Kit and Claude Sonnet 4.5, it wanted to design the whole prod-ready CLI up front. It was hard, if not impossible, to try to scope it to just a single feature or POC. This is what I struggled with most.
Were I to try again, I'd do a lot more manual spec writing or even template rewrites. I expected it to work more-or-less out-of-the-box. Maybe it would've for a standard web app using a popular framework.
It was also difficult to know where one "spec" ended and the next began; should I iterate on the existing one or create a new spec? This might be a solved problem in other SDD frameworks besides Spec-Kit, or else I'm just over thinking it!
I respect this take. As I understand it, in SDD, the code is not the source of truth, it's akin to bytecode; an intermediary between the spec and the observable behavior.
This article if for those who already made up their mind that "spec-based-development" isn't for them.
I believe (and practice) that spec-based development is one of the future methodologis for developing projects with LLMs. At least it will be one of the niches.
Author thinks about specs as waterfalls. I think about them as a context entrypoint for LLMs. Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc) LLM will be able to build sufficient initial context for the solution to expand it by reading files and grepping text. And the most interesting is that you can make LLM to keep the context/spec/projetc file updated each time LLM updates the project. Viola: now you are in agile again: just keep iterating on the context/spec/project
> Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc)
What's not waterfall about this is lost on me.
Sounds to me like you're arguing waterfall is fine if each full run is fast/cheap enough, which could happen with LLMs and simple enough projects. [0]
Agile was offering incremental spec production , which had the tremendous advantage of accumulating knowledge incrementally as well. It might not be a good fit for LLMs, but revising the definition to make it fit doesn't help IMHO.
[0] Reminds me that reducing the project scopes to smaller runs was also a well established way to make waterfall bearable.
Waterfall with short iteration time is not possible by definition.
You might as well say agile is still waterfall, what are sprint if not waterfall with a 2 week iteration time. And Kanbal is just a collection of indepent waterfalls... It's not a useful definition of waterfall.
Exactly. There is a spec, but there is no waterfall required to work and maintain it. Author from the article dismissed spec-based development exactly because they saw resemblance with waterfall. But waterfall isn't required for spec-centric development.
> There is a spec, but there is no waterfall required to work and maintain it.
The problem with waterfall is not that you have to maintain the spec, but that a spec is the wrong way to build a solution. So, it doesn't matter if the spec is written by humans or by LLMs.
I don't see the point of maintaining a spec for LLMs to use as context. They should be able to grep and understand the code itself. A simple readme or a design document, which already should exist for humans, should be enough.
The downfall of Waterfall is that there are too many unproven assumptions in too long of a design cycle. You don't get to find out where you were wrong until testing.
If you break a waterfall project into multiple, smaller, iterative Waterfall processes (a sprint-like iteration), and limit the scope of each, you start to realize some of the benefits of Agile while providing a rich context for directing LLM use during development.
Comparing this to agile is missing the point a bit. The goal isn't to replace agile, it's to find a way that brings context and structure to vibe coding to keep the LLM focused.
"rapid, iterative Waterfall" is a contradiction. Waterfall means only one iteration. If you change the spec after implementation has started, then it's not waterfall. You can't change the requirements, you can't iterate.
Then again, Waterfall was never a real methodology; it was a straw man description of early software development. A hyperbole created only to highlight why we should iterate.
This is the key, with test driven dev sprinkled in.
You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.
I model this as "grounding". Just like you need to ground an electrical system, you need to ground the LLM to reality. The tests do this, so they are REQUIRED for all LLM coding.
Once a framework is established, you require tests for everything. No code is written without tests. These can also be perf tests. They need solid metrics in order to output quality.
The tests provide context and documentation for future LLM runs.
This is also the same way I'd handle foreign teams, that at no fault of their own, would often output subpar code. It was mainly because of a lack of cultural context, communication misunderstandings, and no solid metrics to measure against.
Our main job with LLMs now as software engineers is a strange sort of manager, with a mix of solutions architect, QA director, and patterns expertise. It is actually a lot of work and requires a lot of human people to manage, but the results are real.
I have been experimenting with how meta I can get with this, and the results have been exciting. At one point, I had well over 10 agents working on the same project in parallel, following several design patterns, and they worked so fast I could no longer follow the code. But with layers of tests, layers of agents auditing each other, and isolated domains with well defined interfaces (just as I would expect in a large scale project with multiple human teams), the results speak for themselves.
I write all this to encourage people to take a different approach. Treat the LLMs like they are junior devs or a foreign team speaking a different language. Remember all the design patterns used to get effective use out of people regardless of these barriers. Use them with the LLMs. It works.
You're right that this is the future, but I believe the thread is misdiagnosing the core 'system error'.
The frustration thomascountz describes (tweaking, refining, reshaping) isn't a failure of methodology (SDD vs. Iteration). It's 'cognitive overload' from applying a deterministic mental model to a probabilistic system.
With traditional code, the 'spec' is a blueprint for logic. With an LLM, the 'spec' is a protocol for alignment.
The 'bug' is no longer a logical flaw. It's a statistical deviation. We are no longer debugging the code; we are debugging the spec itself. The LLM is the system executing that spec.
This requires a fundamental shift in our own 'mental OS'—from 'software engineer' to 'cognitive systems architect'.
I could not have said it better. We're on the same page with you.
I would add that to my opinion if previously code production/management was a limiting factor in software development, today it's not. The conceptualisation (onthology, methodology) of the framework (spec-centric devlopment) for the system production and maintenance (code, artifacts, running system) becomes a new limiting factor. But it's matter of time we'll figure out 2-3 methodologies (like it happened with the agile's scrum/kanban) which will become a new "baseline". We're at the early stages when new "laws of llm development" (as in "laws of physics") is still being figured out.
I would maybe argue that there is a sweet spot of how much you feed in (with some variability depending on task). I tend to keep my initial instructions succinct, then build them up iteratively. Others write small novels of instructions before they start, which personally don't like as much. I don't always know what I don't know, so speccing ahead in great detail can sometimes be detrimental.
Agree. I don't use term "spec" as it was with "spec-based development" before llms. There details were required to be defined upfront. With LLMs you can start with vague spec, missing some sections and clarify it with iterations.
Sweet spot will be a moving target. LLMs build-in assumptions, ways to expand concepts will be chaning with LLMs development. So best practices will change with change of the LLMs capabilities. The same set of instructions, not too detailed, were so much better handled by sonnet 4 than sonnet 3 in my experience. Sonnet 3.5 was for me a breaking point which showed that context-based llm development is a feasible strategy.
I would simply replace LLM by agent in your reasoning, in the sense that you'll need a strong preprocessing step and multiple iterations to exploit such complete specs.
There is sense in your words. Especially in the context of the modern day vocabulary.
I though about the concept of this ort of methodology before "agent" (which I would define as "sideeffects with LLM integration") was marketed into community vocabulary. And I'm still rigidly sticking to what I consider "basics". Hope that does not impede understanding.
I had a small embedded project and I did it > 70% using LLM's. This is exactly how I did it. Specs are great for grounding the LLM. Coding with LLM's is going to mean relying more on process since you can't fully trust them. It means writing specs, writing small models to validate, writing tests and a lot of code review to understand what the heck it's doing.
I have similar feelings. I’m willing to believe there are scenarios where this kind of thing makes sense, maybe – a colleague has had great success on a small, predictable greenfield project with it. I don’t work on many of those. My main objections are that I’ve had plenty of success with LLMs without intermediate detailed specs (and I don’t think my failures would have been helped by them), and I just don’t like the idea of primarily reviewing specs. Some sort of plan or research document is a different matter - that’s fine. But the kind of code-like formalised spec thing? I want to look at code, it’s just easier. Plus, I’m going to be reviewing the code too (not doing so is irresponsible in my opinion), so having spec AND code is now double the text to read.
The part of the process the actually needs improving, in my experience in larger codebases, is the research phase, not the implementation. With good, even quite terse research, it’s easy to iterate on a good implementation and then probably take over to finish it off.
I really think LLMs and their agent systems should be kept in their place as tools, first and foremost. We’re still quite early in their development, and they’re still fundamentally unreliable, that I don’t think we should be re-working over-arching work practices around them.
Yeh I think you are right and I am also finding larger apps built using SDD steadily get harder to extend.
> For large existing codebases, SDD is mostly unusable.
I don't really agree with the overall blog post (my view is all of these approaches have value, and we are still to early on to fnd the One True Way) but that point is very true.
It seems to me that most people (myself included) never experienced the actual Waterfall elsewhere than in school curriculum descriptions.
It's a bit funny to see people describe a spec written in days (hours) and iterations lasting multiple weeks as "waterfall".
But these days I've already had people argue that barely stopping to think about a problem before starting to prompt a solution is "too tedious of a process".
To add to this: I've worked on projects that came out of waterfall process, and on projects that came out of too hasty agile iterations.
They both have issues but they are very different. A waterfall project would have inscrutable structure and a large amount of "open doors" just in case a need of an extension at some place would materialize. Paradoxically this makes the code difficult to extend and debug because of overdone abstractions.
Hasty agile code has too many TODOs with "put this hardcoded value in a parameter". It is usually easier to add small features but when coming to a major design flaw it can be easier to throw everything out.
For UI code, AI seems to heavily tend towards the latter.
> the problem with waterfall wasn't the detailed spec
The detailed spec is exactly the problem with the waterfall development. The spec presumes that it is the solution, whereas Agile says “Heck, we don't even understand our problem well, let alone understanding a solution to it.”
Beginning with a detailed spec fast with an LLM already puts you into a complex solution space, which is difficult to navigate compared to a simpler solution space. Regardless of the iteration speed, waterfall is the method that puts you into a complex space. Agile is the one you begin with smaller spaces to arrive at a solution.
Yeah but can't we expect bureaucratic companies to adopt such a methodology exactly like that: write a spec for years, run the LLM agent every 6 months, blame techs for the bad result, iterate, and also forbid coding outside of the spec.
The detailed design spec is an issue hence Agile's "working code over comprehensive documentation". Your two points are consequences of this.
"Heavy documentation before coding" (article) is essentially a bad practice that Agile identified and proposed a remedy to.
Now the article is really about AI-driven development im which the AI agent is a "code monkey" that must be told precisely what to do. I think the interesting thing here will be do find the right balance... IMHO this works best when using LLMs only for small bits at a time instead of trying to specify the whole feature or product.
The Agile manifesto states exactly what I wrote. It's not that comprehensive documentation isn't valuable, it's that working software is more valuable.
In addition, the big issue is when the comprehensive documentation is written first (as in waterfall) because it delays working software and feedback on how well the design works. Bluntly, this does not work.
That's why I think it is best to feed LLMs small chunks of work at a time and to keep the humam dev in the driving see to quickly iterate and experiment, and to be able to easily reason with the AI-generated code (who will do maintenance?)
The article seems to miss many of those points.
IMHO a good start is to have the LLM prompt be a few lines at most and generate about 100 lines of code so you can read it and understand it quickly, tweak it, use it, repeat. Not even convinced you need to keep a record of the prompt at all.
Thank you for writing this. It was also my first impression after seeing not only spec driven dev, but agentic systems that try to mimic human roles and processes 1:1. It feels a bit like putting a saddle on an automobile so that it feels more familiar.
This is a weird article. How many times in your career have you been handed a grossly under-specified feature and had to muddle your way through, asking relevant people along the way and still being told at the end that it’s wrong?
This is exactly the same thing but for AIs. The user might think that the AI got it wrong, except the spec was under-specified and it had to make choices to fill in the gaps, just like a human would.
It’s all well and good if you don’t actually know what you want and you’re using the AI to explore possibilities, but if you already have a firm idea of what you want, just tell it in detail.
Maybe the article is actually about bad specs? It does seem to venture into that territory, but that isn’t the main thrust.
Overall I think this is just a part of the cottage industry that’s sprung up around agile, and an argument for that industry to stay relevant in the age of AI coding, without being well supported by anything.
I sometimes wonder how many comments here are driving a pro AI narrative. This very much seems like one of those:
The agent here is:
Look on HN for AI skeptical posts. Then write a comment that highlights how the human got it wrong. And command your other AI agents to up vote that reply.
There’s nothing keeping you from scoping the spec to an agile package of work. To the contrary: even if you start with a full spec for a multi-day-AI-coding-session, you are free to instruct it to follow agile principles. Just ask it to add checkpoints at which you want to be able to let users test a prototype, or where you want to revisit and discuss the plan.
Bit of a tangent, but this reminds me of a video[1] I watched a bit ago where there was someone who interviewed 20 or so people who were both engineers and programmers, and asked them what the two fields could learn from each other. One of the things it mentioned from the perspective of a physically-based engineer is that a little more up-front planning can make a big difference, and that's stuck with me ever since.
It's a nice observation that Spec-Driven Development essentially implements the waterfall model.
Personally, I tried SDD, consciously trying to like it, but gave up. I find writing specs much harder than writing code, especially when trying to express the finer points of a project. And of course, there is also that personal preference: I like writing code, much more than text. Yes, there are times where I shout "Do What I Mean, not what I say!", but these are mostly learning opportunities.
I have no idea how to reconcile sdd and waterfall. With SDD you’re working per feature, right? Waterfall is speccing the entire project upfront and with a strong force against any changes as you go before a final delivery.
A point I like to make in discussions like this is that software and hardware specifications are very different. We think of software as the thing we're building. But it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Compared to what an architect does when they create a blueprint for a building, creating blueprints for software source code is not a thing.
What in waterfall is considered the design phase is the equivalent of an architect doing sketches, prototypes, and other stuff very early in the project. It's not creating the actual blue print. The building blue print is the equivalent of source code here. It's a complete plan for actually constructing the building down to every nut and bolt.
The big difference here is that building construction is not automated, costly, and risky. So architects try to get their blueprint to a level where they can minimize all of that cost and risk. And you only build the bridge once. So iterating is not really a thing either.
Software is very different; compiling and deploying is relatively cheap and risk free. And typically fully automated. All the effort and risk is contained in the specification process itself. Which is why iteration works.
Architects abandon their sketches and drafts after they've served their purpose. The same is true in waterfall development. The early designs (whiteboard, napking, UML, brainfart on a wiki, etc.) don't matter once the development kicks off. As iterations happen, they fall behind and they just don't matter. Many projects don't have a design phase at all.
The fallacy that software is imperfect as an engineering discipline because we are sloppy with our designs doesn't hold up once you realize that essentially all the effort goes into creating hyper detailed specifications, i.e. the source code.
Having design specifications for your specifications just isn't a thing. Not for buildings, not for software.
We could just stop calling it an engineering discipline. You've laid out plenty of reasons why it is nothing like an engineering discipline in most contexts where people write software.
Real software engineering does exist. It does so precisely in places where you can't risk trying it and seeing it fail, like control systems for things which could kill someone if they failed.
People get offended when you claim most software engineering isn't engineering. I am pretty certain I would quickly get bored if I was actually an engineer. Most real world non-software engineers don't even really get to build anything, they're just there to check designs/implementations for potential future problems.
Maybe there are also people in the software world who _do_ want to do real engineering and they are offended because of that. Who knows.
> it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Agree. My favourite description of software development is specification and translation - done iteratively.
Today, there are two primary phases:
1. Specification by a non-developer and the translation of that into code. The former is led by BAs/PMs etc and the output is feature specs/user stories/acceptance tests etc. The latter id done by developers: they translate the specs into code.
2. The resulting code is also, as you say, a spec. It gets translated into something the machine can run. This is automated by a compiler/interpreter (perhaps in multiple steps, e.g. when a VM is involved).
There have been several attempts over the years to automate the first step. COBOL was probably the first; since then we've had 4GLs, CASE tools, UML among others. They were all trying to close the gap: to take phase 1 specification closer to what non-developers can write - with the result automatically translated to working code.
Spec-driven development is another attempt at this. The translator (LLM) is quite different to previous efforts because it's non-deterministic. That brings some challenges but also offers opportunities to use input language that isn't constrained to be interpretable by conventional means (parsers implementing formal grammars).
We're in the early days of spec-driven. It may fail like its predecessors or it may not. But first order, there's nothing sacrosanct about the use of 3rd generation languages as the means to represent the specification. The pivotal challenge is whether translation from the starting specification can be reliably translated to working software.
Where I work we do (for high assurance software) systems specifications, systems design, software specifications and software design and ultimately source code.
That said, there is a bit of redundancy between software design and source code. We tend to rather get rid of the development of the latter than the former though, i.e. by having the source code be generated by some modelling tool.
I think I've seen enough of a trend: all these LLM ideas eventually get absorbed by the LLM provider and integrated. The OSS projects or companies with products eventually become irrelevant.
So they're more like 3rd party innovations to lobby LLM providers to integrate functionalities.
X prompting method/coding behaviors? Integrated. Media? Integrated. RAG? Integrated. Coding environment? Integrated. Agents? Integrated. Spec-driven development? It's definitely present, perhaps not as formal yet.
But specs are per feature, it’s just an up front discussion first like you’d have on may things rather than questions-> immediate code writing from a model.
Much of the hype around SDD is really about developers never having experienced a real waterfall project.
Of course SDD/Waterfall helps the LLM/Outsourced labor to implement software in a predictable way. Waterfall was always a method to please Managers and in the case of SDD the manager is the user promoting the coding agent.
The problem with SDD/Waterfall is not the first part of the project. The problems come when you are deep into the project, your spec is a total mess and the tiniest feature you want to add requires extremely complex manipulation of the spec.
The success people are experiencing is the success managers have experienced at the beginning of their software projects. SDD will fail for the same reason Waterfall has failed. The constant increasing of complexity in the project, required to keep code and spec consistent can not be managed by LLM or human.
For myself, I found that having a work methodology similar to spec-driven development is much better than vibe coding. The agent makes less mistakes, it stays on the path and I have less issues to fix.
And while at it, I found out that using TDD also helps.
I, for one, welcome the fact that agile/scrum/daily standup/etc. rituals will be outdated. While they might be somehow useful in some software development projects, in the past 10 years it turned out to be a cult of lunatics who want to apply it to any engineering work, not just software, and think any other approach than that will result in bad outcomes and less productivity. Can't wait for the "open office" BS to die next too, literally a boomer mindset that came from government offices back in the day, and they think it's more productive that way.
Very valid. In the beginning all this was driven by developers. Then it was LinkedIn-ified and suddenly we had to deal with agile coaches. Essentially people with no tech qualifications play with developers as guineapigs without understanding the why.
Same is true for UX and DevOps, just create a bunch of positions based on some blog post, and congratulate your self on a job well done. Screwing over the developer (engineers) as usual. Even though they actually might be interested in those jobs.
This is the main problem with big tech informing industry decisions, they win because they make sure they understand what all of this means. For all other companies this just creates a mess and your mentioned frustration.
You can't code without specifications, period. Specifications can have various forms but in ultimately define how your program should work.
The problem with what people call "Waterfall" is that there is an assumption that at some point you have a complete and correct spec and you code off of that.
A spec is never complete. Any methodology applied in a way that does not allow you to go back to revise and/or clarify specs will cause trouble. This was possible with waterfall and is more explicitly encouraged with various agile processes. How much it actually happens in practice differs regardless of how you name the methodology that you use.
I vibe coded for months but switched to spec driven development in the last 6 months
I'm also old enough to have started my career learning the rational unified process and then progressed through XP, agile, scrum etc
My process is I spend 2-3 hours writing a "spec" focusing on acceptance criteria and then by the end of the day I have a working, tested next version of a feature that I push to production.
I don't see how using a spec has made me less agile. My iteration takes 8 hours.
However, I see tons of useless specs. A spec is not a prompt. It's an actual definition of how to tell if something is behaving as intended or not.
People are notoriously bad at thinking about correctness in each scenario which is why vibe coding is so big.
People defer thinking about what correct and incorrect actually looks like for a whole wide scope of scenarios and instead choose to discover through trial and error.
I get 20x ROI on well defined, comprehensive, end to end acceptance tests that the AI can run. They fix everything from big picture functionality to minor logic errors.
Could I see one of your specs as an example?
I just tried an experiment using Spec-Kit from GitHub to build a CLI tool. Perhaps the scope of the tool doesn't align itself with Spec-Driven Development, but I found the many many hours—tweaking, asking, correcting, analyzing, adapting, refining, reshaping, etc—before getting to see any code challenging. As would be the case with Waterfall today, the lack of iterative end-to-end feedback is foreign and frustrating to me.
After Claude finally produced a significant amount of code, and after realizing it hadn't built the right thing, I was back to the drawing board to find out what language in the spec had led it astray. Never mind digging through the code at this point; it would be just as good to start again than to try to onboard myself to the 1000s of lines of code it had built... and I suppose the point is to ignore the code as "implementation detail" anyway.
Just to make clear: I love writing code with an LLM, be it for brainstorming, research, or implementation. I often write—and have it output—small markdown notes and plans for it to ground itself. I think I just found this experience with SDD quite heavy-handed and the workflow unwieldy.
I did this first too. The trick is realising that the "spec" isn't a full system spec, per se, but a detailed description of what you want to do.
System specs are non trivial for current AI agents. Hand prompting every step is time consuming.
I think (and I am still learning!) SDD sits as a fix for that. I can give it two fairly simple prompts & get a reasonably complex result. It's not a full system but it's more than I could get with two prompts previously.
The verbose "spec" stuff is just feeding the LLMs love of context, and more importantly what I think we all know is you have to tell an agent over and over how to get the right answer or it will deviate.
Early on with speckit I found I was clarifying a lot but I've discovered that was just me being not so good at writing specs!
Example prompts for speckit;
(Specify) I want to build a simple admin interface. First I want to be able to access the interface, and I want to be able to log in with my Google Workspaces account (and you should restrict logins to my workspaces domain). I will be the global superadmin, but I also want a simple RBAC where I can apply a set of roles to any user account. For simplicity let's make a record user accounts when they first log in. The first roles I want are Admin, Editor and Viewer.
(Plan) I want to implement this as a NextJS app using the latest version of Next. Please also use Mantine for styling instead of Tailwind. I want to use DynamoDB as my database for this project, so you'll also need to use Auth.js over Better Auth. It's critical that when we implement you write tests first before writing code; forget UI tests, focus on unit and integration tests. All API endpoints should have a documented contract which is tested. I also need to be able to run the dev environment locally so make sure to localise things like the database.
I think the challenge is how to create a small but evolvable spec.
What LLMs bring to the picture is that "spec" is high-level coding. In normal coding you start by writing small functions then verify that they work. Similarly LLMs should perhaps be given small specs to start with, then add more functions/features to the spec incrementally. Would that work?
Thanks! With Spec-Kit and Claude Sonnet 4.5, it wanted to design the whole prod-ready CLI up front. It was hard, if not impossible, to try to scope it to just a single feature or POC. This is what I struggled with most.
Were I to try again, I'd do a lot more manual spec writing or even template rewrites. I expected it to work more-or-less out-of-the-box. Maybe it would've for a standard web app using a popular framework.
It was also difficult to know where one "spec" ended and the next began; should I iterate on the existing one or create a new spec? This might be a solved problem in other SDD frameworks besides Spec-Kit, or else I'm just over thinking it!
I think the problem is you still care about the craft. You need to let go and let the tide take you.
I respect this take. As I understand it, in SDD, the code is not the source of truth, it's akin to bytecode; an intermediary between the spec and the observable behavior.
The rip tide
This article if for those who already made up their mind that "spec-based-development" isn't for them.
I believe (and practice) that spec-based development is one of the future methodologis for developing projects with LLMs. At least it will be one of the niches.
Author thinks about specs as waterfalls. I think about them as a context entrypoint for LLMs. Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc) LLM will be able to build sufficient initial context for the solution to expand it by reading files and grepping text. And the most interesting is that you can make LLM to keep the context/spec/projetc file updated each time LLM updates the project. Viola: now you are in agile again: just keep iterating on the context/spec/project
> Giving enough info about the project (including user stories, tech design requirements, filesystem structure and meaning, core interfaces/models, functions, etc)
What's not waterfall about this is lost on me.
Sounds to me like you're arguing waterfall is fine if each full run is fast/cheap enough, which could happen with LLMs and simple enough projects. [0]
Agile was offering incremental spec production , which had the tremendous advantage of accumulating knowledge incrementally as well. It might not be a good fit for LLMs, but revising the definition to make it fit doesn't help IMHO.
[0] Reminds me that reducing the project scopes to smaller runs was also a well established way to make waterfall bearable.
Waterfall with short iteration time is not possible by definition.
You might as well say agile is still waterfall, what are sprint if not waterfall with a 2 week iteration time. And Kanbal is just a collection of indepent waterfalls... It's not a useful definition of waterfall.
> What's not waterfall about this is lost on me.
Exactly. There is a spec, but there is no waterfall required to work and maintain it. Author from the article dismissed spec-based development exactly because they saw resemblance with waterfall. But waterfall isn't required for spec-centric development.
> There is a spec, but there is no waterfall required to work and maintain it.
The problem with waterfall is not that you have to maintain the spec, but that a spec is the wrong way to build a solution. So, it doesn't matter if the spec is written by humans or by LLMs.
I don't see the point of maintaining a spec for LLMs to use as context. They should be able to grep and understand the code itself. A simple readme or a design document, which already should exist for humans, should be enough.
I see rapid, iterative Waterfall.
The downfall of Waterfall is that there are too many unproven assumptions in too long of a design cycle. You don't get to find out where you were wrong until testing.
If you break a waterfall project into multiple, smaller, iterative Waterfall processes (a sprint-like iteration), and limit the scope of each, you start to realize some of the benefits of Agile while providing a rich context for directing LLM use during development.
Comparing this to agile is missing the point a bit. The goal isn't to replace agile, it's to find a way that brings context and structure to vibe coding to keep the LLM focused.
"rapid, iterative Waterfall" is a contradiction. Waterfall means only one iteration. If you change the spec after implementation has started, then it's not waterfall. You can't change the requirements, you can't iterate.
Then again, Waterfall was never a real methodology; it was a straw man description of early software development. A hyperbole created only to highlight why we should iterate.
This is the key, with test driven dev sprinkled in.
You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.
I model this as "grounding". Just like you need to ground an electrical system, you need to ground the LLM to reality. The tests do this, so they are REQUIRED for all LLM coding.
Once a framework is established, you require tests for everything. No code is written without tests. These can also be perf tests. They need solid metrics in order to output quality.
The tests provide context and documentation for future LLM runs.
This is also the same way I'd handle foreign teams, that at no fault of their own, would often output subpar code. It was mainly because of a lack of cultural context, communication misunderstandings, and no solid metrics to measure against.
Our main job with LLMs now as software engineers is a strange sort of manager, with a mix of solutions architect, QA director, and patterns expertise. It is actually a lot of work and requires a lot of human people to manage, but the results are real.
I have been experimenting with how meta I can get with this, and the results have been exciting. At one point, I had well over 10 agents working on the same project in parallel, following several design patterns, and they worked so fast I could no longer follow the code. But with layers of tests, layers of agents auditing each other, and isolated domains with well defined interfaces (just as I would expect in a large scale project with multiple human teams), the results speak for themselves.
I write all this to encourage people to take a different approach. Treat the LLMs like they are junior devs or a foreign team speaking a different language. Remember all the design patterns used to get effective use out of people regardless of these barriers. Use them with the LLMs. It works.
You're right that this is the future, but I believe the thread is misdiagnosing the core 'system error'.
The frustration thomascountz describes (tweaking, refining, reshaping) isn't a failure of methodology (SDD vs. Iteration). It's 'cognitive overload' from applying a deterministic mental model to a probabilistic system.
With traditional code, the 'spec' is a blueprint for logic. With an LLM, the 'spec' is a protocol for alignment.
The 'bug' is no longer a logical flaw. It's a statistical deviation. We are no longer debugging the code; we are debugging the spec itself. The LLM is the system executing that spec.
This requires a fundamental shift in our own 'mental OS'—from 'software engineer' to 'cognitive systems architect'.
I could not have said it better. We're on the same page with you.
I would add that to my opinion if previously code production/management was a limiting factor in software development, today it's not. The conceptualisation (onthology, methodology) of the framework (spec-centric devlopment) for the system production and maintenance (code, artifacts, running system) becomes a new limiting factor. But it's matter of time we'll figure out 2-3 methodologies (like it happened with the agile's scrum/kanban) which will become a new "baseline". We're at the early stages when new "laws of llm development" (as in "laws of physics") is still being figured out.
I would maybe argue that there is a sweet spot of how much you feed in (with some variability depending on task). I tend to keep my initial instructions succinct, then build them up iteratively. Others write small novels of instructions before they start, which personally don't like as much. I don't always know what I don't know, so speccing ahead in great detail can sometimes be detrimental.
Agree. I don't use term "spec" as it was with "spec-based development" before llms. There details were required to be defined upfront. With LLMs you can start with vague spec, missing some sections and clarify it with iterations.
Sweet spot will be a moving target. LLMs build-in assumptions, ways to expand concepts will be chaning with LLMs development. So best practices will change with change of the LLMs capabilities. The same set of instructions, not too detailed, were so much better handled by sonnet 4 than sonnet 3 in my experience. Sonnet 3.5 was for me a breaking point which showed that context-based llm development is a feasible strategy.
I would simply replace LLM by agent in your reasoning, in the sense that you'll need a strong preprocessing step and multiple iterations to exploit such complete specs.
There is sense in your words. Especially in the context of the modern day vocabulary.
I though about the concept of this ort of methodology before "agent" (which I would define as "sideeffects with LLM integration") was marketed into community vocabulary. And I'm still rigidly sticking to what I consider "basics". Hope that does not impede understanding.
I had a small embedded project and I did it > 70% using LLM's. This is exactly how I did it. Specs are great for grounding the LLM. Coding with LLM's is going to mean relying more on process since you can't fully trust them. It means writing specs, writing small models to validate, writing tests and a lot of code review to understand what the heck it's doing.
I have similar feelings. I’m willing to believe there are scenarios where this kind of thing makes sense, maybe – a colleague has had great success on a small, predictable greenfield project with it. I don’t work on many of those. My main objections are that I’ve had plenty of success with LLMs without intermediate detailed specs (and I don’t think my failures would have been helped by them), and I just don’t like the idea of primarily reviewing specs. Some sort of plan or research document is a different matter - that’s fine. But the kind of code-like formalised spec thing? I want to look at code, it’s just easier. Plus, I’m going to be reviewing the code too (not doing so is irresponsible in my opinion), so having spec AND code is now double the text to read.
The part of the process the actually needs improving, in my experience in larger codebases, is the research phase, not the implementation. With good, even quite terse research, it’s easy to iterate on a good implementation and then probably take over to finish it off.
I really think LLMs and their agent systems should be kept in their place as tools, first and foremost. We’re still quite early in their development, and they’re still fundamentally unreliable, that I don’t think we should be re-working over-arching work practices around them.
Yeh I think you are right and I am also finding larger apps built using SDD steadily get harder to extend.
> For large existing codebases, SDD is mostly unusable.
I don't really agree with the overall blog post (my view is all of these approaches have value, and we are still to early on to fnd the One True Way) but that point is very true.
And this is absolutely fine, because the problem with waterfall wasn't the detailed spec, it was
a) the mulyi-year lead time from starting the spec to getting a finished product
b) no (cheap) way to iterate or deliver outside the spec
Neither of these are a problem with SDD.
It seems to me that most people (myself included) never experienced the actual Waterfall elsewhere than in school curriculum descriptions.
It's a bit funny to see people describe a spec written in days (hours) and iterations lasting multiple weeks as "waterfall".
But these days I've already had people argue that barely stopping to think about a problem before starting to prompt a solution is "too tedious of a process".
To add to this: I've worked on projects that came out of waterfall process, and on projects that came out of too hasty agile iterations.
They both have issues but they are very different. A waterfall project would have inscrutable structure and a large amount of "open doors" just in case a need of an extension at some place would materialize. Paradoxically this makes the code difficult to extend and debug because of overdone abstractions.
Hasty agile code has too many TODOs with "put this hardcoded value in a parameter". It is usually easier to add small features but when coming to a major design flaw it can be easier to throw everything out.
For UI code, AI seems to heavily tend towards the latter.
> the problem with waterfall wasn't the detailed spec
The detailed spec is exactly the problem with the waterfall development. The spec presumes that it is the solution, whereas Agile says “Heck, we don't even understand our problem well, let alone understanding a solution to it.”
Beginning with a detailed spec fast with an LLM already puts you into a complex solution space, which is difficult to navigate compared to a simpler solution space. Regardless of the iteration speed, waterfall is the method that puts you into a complex space. Agile is the one you begin with smaller spaces to arrive at a solution.
Yeah but can't we expect bureaucratic companies to adopt such a methodology exactly like that: write a spec for years, run the LLM agent every 6 months, blame techs for the bad result, iterate, and also forbid coding outside of the spec.
The detailed design spec is an issue hence Agile's "working code over comprehensive documentation". Your two points are consequences of this.
"Heavy documentation before coding" (article) is essentially a bad practice that Agile identified and proposed a remedy to.
Now the article is really about AI-driven development im which the AI agent is a "code monkey" that must be told precisely what to do. I think the interesting thing here will be do find the right balance... IMHO this works best when using LLMs only for small bits at a time instead of trying to specify the whole feature or product.
It's often forgotten that the last part of the agile manifesto states that the eg 'comprehensive documentation' _is_ valuable.
The key to Agile isn't documentation - it's in the ability to change at speed (perhaps as markets change). Literally "agile".
This approach allows for that comprehensive documentation without sacrificing agility.
The Agile manifesto states exactly what I wrote. It's not that comprehensive documentation isn't valuable, it's that working software is more valuable.
In addition, the big issue is when the comprehensive documentation is written first (as in waterfall) because it delays working software and feedback on how well the design works. Bluntly, this does not work.
That's why I think it is best to feed LLMs small chunks of work at a time and to keep the humam dev in the driving see to quickly iterate and experiment, and to be able to easily reason with the AI-generated code (who will do maintenance?)
The article seems to miss many of those points.
IMHO a good start is to have the LLM prompt be a few lines at most and generate about 100 lines of code so you can read it and understand it quickly, tweak it, use it, repeat. Not even convinced you need to keep a record of the prompt at all.
Thank you for writing this. It was also my first impression after seeing not only spec driven dev, but agentic systems that try to mimic human roles and processes 1:1. It feels a bit like putting a saddle on an automobile so that it feels more familiar.
This is a weird article. How many times in your career have you been handed a grossly under-specified feature and had to muddle your way through, asking relevant people along the way and still being told at the end that it’s wrong?
This is exactly the same thing but for AIs. The user might think that the AI got it wrong, except the spec was under-specified and it had to make choices to fill in the gaps, just like a human would.
It’s all well and good if you don’t actually know what you want and you’re using the AI to explore possibilities, but if you already have a firm idea of what you want, just tell it in detail.
Maybe the article is actually about bad specs? It does seem to venture into that territory, but that isn’t the main thrust.
Overall I think this is just a part of the cottage industry that’s sprung up around agile, and an argument for that industry to stay relevant in the age of AI coding, without being well supported by anything.
I sometimes wonder how many comments here are driving a pro AI narrative. This very much seems like one of those:
The agent here is:
Look on HN for AI skeptical posts. Then write a comment that highlights how the human got it wrong. And command your other AI agents to up vote that reply.
Don't really think the thesis is fair.
SDD as it's presented is a bit heavy weight, if you experimented with a bit, there is a lighter version that can work.
For some mini modules, we keep a single page spec as 'source of truth' instead of the code.
It's nice but has it's caveats but they are less of a concern over time.
There’s nothing keeping you from scoping the spec to an agile package of work. To the contrary: even if you start with a full spec for a multi-day-AI-coding-session, you are free to instruct it to follow agile principles. Just ask it to add checkpoints at which you want to be able to let users test a prototype, or where you want to revisit and discuss the plan.
Replace the product people with LLMs and keep the engineers. You'll get better results that way.
Bit of a tangent, but this reminds me of a video[1] I watched a bit ago where there was someone who interviewed 20 or so people who were both engineers and programmers, and asked them what the two fields could learn from each other. One of the things it mentioned from the perspective of a physically-based engineer is that a little more up-front planning can make a big difference, and that's stuck with me ever since.
[1] (pretty sure this is the right one): https://youtu.be/CmIGPGPdxTI
It's a nice observation that Spec-Driven Development essentially implements the waterfall model.
Personally, I tried SDD, consciously trying to like it, but gave up. I find writing specs much harder than writing code, especially when trying to express the finer points of a project. And of course, there is also that personal preference: I like writing code, much more than text. Yes, there are times where I shout "Do What I Mean, not what I say!", but these are mostly learning opportunities.
I have no idea how to reconcile sdd and waterfall. With SDD you’re working per feature, right? Waterfall is speccing the entire project upfront and with a strong force against any changes as you go before a final delivery.
> Agile methodologies killed the specification document long ago. Do we really need to bring it back from the dead?
it didn't really kill it - it just made the spec massively disjoint, split across hundreds to thousands of randomly filled Jira tickets.
A point I like to make in discussions like this is that software and hardware specifications are very different. We think of software as the thing we're building. But it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Compared to what an architect does when they create a blueprint for a building, creating blueprints for software source code is not a thing.
What in waterfall is considered the design phase is the equivalent of an architect doing sketches, prototypes, and other stuff very early in the project. It's not creating the actual blue print. The building blue print is the equivalent of source code here. It's a complete plan for actually constructing the building down to every nut and bolt.
The big difference here is that building construction is not automated, costly, and risky. So architects try to get their blueprint to a level where they can minimize all of that cost and risk. And you only build the bridge once. So iterating is not really a thing either.
Software is very different; compiling and deploying is relatively cheap and risk free. And typically fully automated. All the effort and risk is contained in the specification process itself. Which is why iteration works.
Architects abandon their sketches and drafts after they've served their purpose. The same is true in waterfall development. The early designs (whiteboard, napking, UML, brainfart on a wiki, etc.) don't matter once the development kicks off. As iterations happen, they fall behind and they just don't matter. Many projects don't have a design phase at all.
The fallacy that software is imperfect as an engineering discipline because we are sloppy with our designs doesn't hold up once you realize that essentially all the effort goes into creating hyper detailed specifications, i.e. the source code.
Having design specifications for your specifications just isn't a thing. Not for buildings, not for software.
We could just stop calling it an engineering discipline. You've laid out plenty of reasons why it is nothing like an engineering discipline in most contexts where people write software.
Real software engineering does exist. It does so precisely in places where you can't risk trying it and seeing it fail, like control systems for things which could kill someone if they failed.
People get offended when you claim most software engineering isn't engineering. I am pretty certain I would quickly get bored if I was actually an engineer. Most real world non-software engineers don't even really get to build anything, they're just there to check designs/implementations for potential future problems.
Maybe there are also people in the software world who _do_ want to do real engineering and they are offended because of that. Who knows.
Extend your reasoning.
> it's really just a spec that gets turned into the thing we actually run. It's just that the building process is fully automated. What we do when we create software is creating a specification in source code form.
Agree. My favourite description of software development is specification and translation - done iteratively.
Today, there are two primary phases:
1. Specification by a non-developer and the translation of that into code. The former is led by BAs/PMs etc and the output is feature specs/user stories/acceptance tests etc. The latter id done by developers: they translate the specs into code.
2. The resulting code is also, as you say, a spec. It gets translated into something the machine can run. This is automated by a compiler/interpreter (perhaps in multiple steps, e.g. when a VM is involved).
There have been several attempts over the years to automate the first step. COBOL was probably the first; since then we've had 4GLs, CASE tools, UML among others. They were all trying to close the gap: to take phase 1 specification closer to what non-developers can write - with the result automatically translated to working code.
Spec-driven development is another attempt at this. The translator (LLM) is quite different to previous efforts because it's non-deterministic. That brings some challenges but also offers opportunities to use input language that isn't constrained to be interpretable by conventional means (parsers implementing formal grammars).
We're in the early days of spec-driven. It may fail like its predecessors or it may not. But first order, there's nothing sacrosanct about the use of 3rd generation languages as the means to represent the specification. The pivotal challenge is whether translation from the starting specification can be reliably translated to working software.
If it can (big if) then economics will win out.
Where I work we do (for high assurance software) systems specifications, systems design, software specifications and software design and ultimately source code.
That said, there is a bit of redundancy between software design and source code. We tend to rather get rid of the development of the latter than the former though, i.e. by having the source code be generated by some modelling tool.
I think I've seen enough of a trend: all these LLM ideas eventually get absorbed by the LLM provider and integrated. The OSS projects or companies with products eventually become irrelevant.
So they're more like 3rd party innovations to lobby LLM providers to integrate functionalities.
X prompting method/coding behaviors? Integrated. Media? Integrated. RAG? Integrated. Coding environment? Integrated. Agents? Integrated. Spec-driven development? It's definitely present, perhaps not as formal yet.
Sounds like the makings of a healthy software ecosystem
But specs are per feature, it’s just an up front discussion first like you’d have on may things rather than questions-> immediate code writing from a model.
>> You can see my instructions in the coding session logs
such a rare (but valued!) occurrence in these posts. Thanks for sharing
I'm still not convinced there's anything wrong with waterfall for some projects.
It is/was fine if you were willing to bet on your base specs being near perfect.
What LLM tools are folks seeing that do the most or the best to integrate specs?
Amazon's Kiro is incredibly spec driven. Haven't tried it but interested. Amplifier has a strong document-driven-development loop also built-in. https://github.com/microsoft/amplifier?tab=readme-ov-file#-d...
That's Event-B (minus the formal side) with LLM
Much of the hype around SDD is really about developers never having experienced a real waterfall project.
Of course SDD/Waterfall helps the LLM/Outsourced labor to implement software in a predictable way. Waterfall was always a method to please Managers and in the case of SDD the manager is the user promoting the coding agent.
The problem with SDD/Waterfall is not the first part of the project. The problems come when you are deep into the project, your spec is a total mess and the tiniest feature you want to add requires extremely complex manipulation of the spec.
The success people are experiencing is the success managers have experienced at the beginning of their software projects. SDD will fail for the same reason Waterfall has failed. The constant increasing of complexity in the project, required to keep code and spec consistent can not be managed by LLM or human.
For myself, I found that having a work methodology similar to spec-driven development is much better than vibe coding. The agent makes less mistakes, it stays on the path and I have less issues to fix.
And while at it, I found out that using TDD also helps.
I, for one, welcome the fact that agile/scrum/daily standup/etc. rituals will be outdated. While they might be somehow useful in some software development projects, in the past 10 years it turned out to be a cult of lunatics who want to apply it to any engineering work, not just software, and think any other approach than that will result in bad outcomes and less productivity. Can't wait for the "open office" BS to die next too, literally a boomer mindset that came from government offices back in the day, and they think it's more productive that way.
Very valid. In the beginning all this was driven by developers. Then it was LinkedIn-ified and suddenly we had to deal with agile coaches. Essentially people with no tech qualifications play with developers as guineapigs without understanding the why.
Same is true for UX and DevOps, just create a bunch of positions based on some blog post, and congratulate your self on a job well done. Screwing over the developer (engineers) as usual. Even though they actually might be interested in those jobs.
This is the main problem with big tech informing industry decisions, they win because they make sure they understand what all of this means. For all other companies this just creates a mess and your mentioned frustration.
You can't code without specifications, period. Specifications can have various forms but in ultimately define how your program should work.
The problem with what people call "Waterfall" is that there is an assumption that at some point you have a complete and correct spec and you code off of that.
A spec is never complete. Any methodology applied in a way that does not allow you to go back to revise and/or clarify specs will cause trouble. This was possible with waterfall and is more explicitly encouraged with various agile processes. How much it actually happens in practice differs regardless of how you name the methodology that you use.
Of course you can code without specifications. Most software projects don't have them these days.
In contrast they're still the standard in the hardware design world.