DevOps only failed in that so many don't know what it is.
DevOps isn't a tool, but there are lots of tools that make it easier to implement.
DevOps isn't how management can eliminate half the org and have one person do two roles, specialization is still valuable.
DevOps isn't an organization structure, though the wrong org structure can make it fail.
DevOps is collaboration. It's getting two distinct roles to better interoperate. The dev team that wants to push features fast. And the ops team that wants stability and uptime.
From the management side, if you aren't focused on building teams that work well together, eliminating conflicts, rewarding the team collectively for features and uptime, and giving them the resources to deliver, that's not a DevOps failure, that's a management failure.
It failed because there is an ongoing denial that development and operations are two distinct skillsets.
If you think 10x devs are unicorns consider how much harder it is to get someone 10x at the intersection of both domains. (Personally I have never met one). You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.
From someone who has managed both Developmentals team and Operations team for decades.. trust me, they are different beasts and have to be handled/tackled differently.
Expecting Devs or Ops to do both types of work, is usually asking for trouble, unless the organization is geared up from the ground up for such seamless work. It is more of a corporate problem, rather than a team working style or work expectations & behavior problem.
The same goes for Agile vs Waterfall. Agile works well if the organization is inherently (or overhauled to be) agile, otherwise it doesn't.
DevOps is dead because it's run by a bunch of ops people who don't know how to do dev and a bunch of dev people who don't know how to do ops. The only tooling problem is that a bunch of companies created "DevOps tools" that then get dictated to use: K8s, terraform, etc. The only way this works is if you build the application to fit within those frameworks. Writing an indexer that is massively parallel and is mainly constrained by CPU/Memory. Instead, you have devs building something that gets thrown over the fence to a devops team that then containerizes it and throw it on K8s. What happens if the application requires lots of IOPS or network bandwidth? K8s doesn't schedule applications that way. "Oh you can customize the scheduler to take that into account". 2 years later, it's still not "customized" because they are ops people who don't know how to code. If you do customize it, the API is going to change in a few months which will break when you upgrade.
Would you say it's truly dead or that it fails to meet the performance bar you've described?
The reality is that most devs do not consider a holistic picture that includes the infrastructure they will be deploying to. In many cases, it's certainly a skill issue; good devs are hard to find. And to flip the coin, it's hard to find good ops people too.
The reason DevOps continues to linger, however vague a discipline it is, is because it allows the business to differentiate between revenue generating roles and cost center roles. You want your dev resources to prioritize feature work, at the beckon of PMs or upper management, and let your "DevOps" resources to be responsible for actually getting the product deployed.
In essence, it's a ploy to further commoditize engineering roles, because finding unicorns that understand the picture top-to-bottom is difficult (finding /top/ talent is difficult!). In this way, DevOps is well and alive, as a Romero zombie.
I would say, in pre-CC (pre-claude-code), this might seem like a daunting task for average DevOps engineers. But in post-CC, there is just no excuse to fret from such challenges.
EDIT: lol - I am getting downvoted for suggesting some DevOps engineers will actually be ready to take on tasks that were previously more intimidating. I really hope those folks are from the never-coding-agent camp. When I refer to. reliance on CC or Codex, I meant being engaged at a wholesome level with AI -- not blindly one-shotting solutions. This means having the patience to understand the complexity of the system, the criticality of its downtime in the overall architecture (in this case it's the k8s controller), ability to learn the codebase, using the right MCPs to delve into all the details needed for testing changes locally etc). These are system-level skills and barely overlaps with just coding skills.
Don't want to get too deep into your analogy. I was addressing the "DevOps cannot code" part. To me it is a leadership failure if a DevOps team is still afraid of tackling bigger challenges (like the example given by the OP). That, of course, depends on whether DevOps teams will exist in the long run.
I've always felt that DevOps became a function/team partly because companies and especially SWE's started complaining that they were spending too much time "doing Ops work" and product/business started demanding more features for which they running out of cycles. And add to that the burnout from being on-call (especially if the dev team is relatively small and you have to go on-call every 2-3 weekends).
> most orgs are used to responding to a daytime alert by calling out, “Who just shipped that change?” assuming that whoever merged the diff surely understands how it works and can fix it post-haste. What happens when nobody wrote the code you just deployed, and nobody really understands it?
I assume the first time this happens at any given company will be the moment they realize fully autonomous code changes made on production systems by agents is a terrible idea and every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM.
I think the opposite will happen - leadership will forego this attitude of "reverse course on the first outage".
Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).
Software engineering purists have to get out of some of these religious beliefs
What happens if the person who wrote the code went on vacation? What happens if the code is many years old and no current team member has touched the code?
Understanding code you didn't personally write is part of the job.
Am I the only one who remembers when DevOps meant "developers are responsible for dealing with the operational part of their software too, so that they don't just throw stuff over the wall for another team to deal with the 3AM pages"?
It seems to have become: "we turned ops into coding too, so now the ops team needs to be good at software engineering"
DevOps was (and is) merely an excuse for companies to replace Developers with cheaper Ops resources, and yet expecting better services and better products from them.
My personal experience says that the best way is that Ops team shouldn not be repurposed as Developers, rather put the experienced Developers into Production Support (incident management, that's intense Ops, working in shifts and weekends, etc.). And rotate them whenever needed. Over a period of time, you'll invariably see less defects and issues percolating down
from the Devs, and then after both sides are stable and working well together with less friction and open tickets, then some more tech savvy Ops members can be rotated into Development teams as rookie devs to help reduce costs a bit (as there'll invariably be some natural attrition among the Devs and Ops, so this gives an alternative career path to the Ops team (who are usually less paid, and more stressed), and pushes the Devs not to become complacent). Such an approach is doable and productive.
DevOps is a methodology. DevOps as a role or team name is a fantasy from people who do not understand the methodology.
If you want DevOps to work, your Ops must be member of the development team, take part in the sprints, etc. But many company do not want to do that because they want to separate ops and dev budget/accounting and do not want to hire enough people with ops skills.
Because the idea you can have all aspects of maintaining a complex piece of technology, maintained by a single cross-skilled team of interchangeable cogs, is utopian and unworkable past any reasonable level of scale
DevOps, shift left, full stack dev, all reminds me of the Futurama episode where Hermes Conrad successfully reorgs the slave camp he's sent to, so that all physical labour is done by a single Australian man
Speaking darker, there is a kind of - well, perhaps not misanthropy, but certainly a not-so-well-meaning dismissiveness, to the "silo breaking" philosophy that looks at complex fields and says "well these should all just be lumped together as one thing, the important stuff is simple, I don't know why you're making all these siloes, man" - assuming that ops specialists, sysadmins, programmers, DBAs, frontend devs, mobile devs, data engineers and testers have just invented the breadth and depth and subtleties of their entire fields, only as a way of keeping everybody else out
But modern systems are complex, they are only getting more so, and the further you buy into the shift-left everyone-is-everything computer-jobs-are-all-the-same philosophy the harder and harder it will get to find employees who can straddle the exhausting range of knowledge to master
My message to the CTO of Honeycomb.io (who apparently wrote this post): please avoid getting philosophical and controversial to gin up curiosity about your AI platform. If you want to highlight the benefits of your platform then do so earnestly and objectively. Please don't mask marketing with an excoriation of a profession that has never been well-defined (or has always been defined to fit into an organization's political landscape for the most part). And you guys (like every other SRE/Ops platform) capitalized on that structural divide and deservedly got rich by selling licenses to these teams. I don't think you can come in now with this holier-than-thou best practice messaging just because platforms like yours have zero moat in this post-CC/Codex world.
If your developers weren't looking at dashboards before, they won't use a chat interface to interrogate it either. That doesn't really bring it to them any more than their existing capabilities. There's also a worrying underlying assumption being made here that the answers your LLM will give you are accurate and trustworthy.
> There's also a worrying underlying assumption being made here that the answers your LLM will give you are accurate and trustworthy.
I first hand saw in, AWS devDays, an AI giving SIWINCH as "root-cause" of Apache error in a containerized process is in EKS for a backend FCGI process connection error.
It has been extremely hard since that demo to trust any AI for system level debugging.
If we were smart we'd use AI to grok a system in order to help us reduce its complexity. I don't think we're anywhere close to even being able to provide all the necessary context to solve problems like this.
In my experience DevOps has little interest in doing actual DevOps - they just want to run ops. They want to advise (or tell us we’re holding it wrong) but actually get their hands dirty. On the flip side, devs don’t want to spend a ton of time learning k8s or how to manage servers, cloud services, etc.
DevOps is a mess of our own making - embracing K8s created complexity for little gain for nearly all companies.
"I think the entire DevOps movement was a mighty, ... it failed."
I'm so sick of this nonsense. "Devops" isn't failing, isn't an issue, you can rename it whatever you want, but throughout my career the devops engineers (the ones you don't skimp on) are the best, highest paid professionals at the company.
I don't know why I keep reading these completely crazy think-pieces hemming and hawing about a system (having a few engineers who master performance/backups/deployments/oncall/retros) that seems to be wildly successful. It would be nice if more engineers understood under-the-hood, but most companies choose not to exclusively hire at that caliber.
I can't wait for indie developers to build super-agents that commoditize providers like Honeycomb.io and more importantly clone all their features and offer them up for free as OSS.
DevOps only works when the developers are always right. What usually happens is the DevOps team thinks they know best (they are developers too, just not the ones using the tools), and they build a lot of garbage that no one wants to use, often making things more complicated than they were before.
Eventually a bureaucrat becomes the manager of the team, and seeks to expand the set of things under DevOps' control.
This makes the team a single point of failure for more and more things, while driving more and more developer processes towards mediocrity.
Velocity slows, while the DevOps bottlenecks are used as a reason to hire.
It's an organizational problem, not a talent or knowledge problem.
Allowing a group to hire and grow within an organization, which is not directly accountable for the success of the other parts of the organization that it was intended to support, is creating a cancer, definitionally.
DevOps only failed in that so many don't know what it is.
DevOps isn't a tool, but there are lots of tools that make it easier to implement.
DevOps isn't how management can eliminate half the org and have one person do two roles, specialization is still valuable.
DevOps isn't an organization structure, though the wrong org structure can make it fail.
DevOps is collaboration. It's getting two distinct roles to better interoperate. The dev team that wants to push features fast. And the ops team that wants stability and uptime.
From the management side, if you aren't focused on building teams that work well together, eliminating conflicts, rewarding the team collectively for features and uptime, and giving them the resources to deliver, that's not a DevOps failure, that's a management failure.
It failed because there is an ongoing denial that development and operations are two distinct skillsets.
If you think 10x devs are unicorns consider how much harder it is to get someone 10x at the intersection of both domains. (Personally I have never met one). You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.
From someone who has managed both Developmentals team and Operations team for decades.. trust me, they are different beasts and have to be handled/tackled differently.
Expecting Devs or Ops to do both types of work, is usually asking for trouble, unless the organization is geared up from the ground up for such seamless work. It is more of a corporate problem, rather than a team working style or work expectations & behavior problem.
The same goes for Agile vs Waterfall. Agile works well if the organization is inherently (or overhauled to be) agile, otherwise it doesn't.
You don't need 10x developers. You just need to avoid the 1/10 multiplier of pitting separate development and operations teams against each other.
DevOps is dead because it's run by a bunch of ops people who don't know how to do dev and a bunch of dev people who don't know how to do ops. The only tooling problem is that a bunch of companies created "DevOps tools" that then get dictated to use: K8s, terraform, etc. The only way this works is if you build the application to fit within those frameworks. Writing an indexer that is massively parallel and is mainly constrained by CPU/Memory. Instead, you have devs building something that gets thrown over the fence to a devops team that then containerizes it and throw it on K8s. What happens if the application requires lots of IOPS or network bandwidth? K8s doesn't schedule applications that way. "Oh you can customize the scheduler to take that into account". 2 years later, it's still not "customized" because they are ops people who don't know how to code. If you do customize it, the API is going to change in a few months which will break when you upgrade.
Would you say it's truly dead or that it fails to meet the performance bar you've described?
The reality is that most devs do not consider a holistic picture that includes the infrastructure they will be deploying to. In many cases, it's certainly a skill issue; good devs are hard to find. And to flip the coin, it's hard to find good ops people too.
The reason DevOps continues to linger, however vague a discipline it is, is because it allows the business to differentiate between revenue generating roles and cost center roles. You want your dev resources to prioritize feature work, at the beckon of PMs or upper management, and let your "DevOps" resources to be responsible for actually getting the product deployed.
In essence, it's a ploy to further commoditize engineering roles, because finding unicorns that understand the picture top-to-bottom is difficult (finding /top/ talent is difficult!). In this way, DevOps is well and alive, as a Romero zombie.
I would say, in pre-CC (pre-claude-code), this might seem like a daunting task for average DevOps engineers. But in post-CC, there is just no excuse to fret from such challenges.
EDIT: lol - I am getting downvoted for suggesting some DevOps engineers will actually be ready to take on tasks that were previously more intimidating. I really hope those folks are from the never-coding-agent camp. When I refer to. reliance on CC or Codex, I meant being engaged at a wholesome level with AI -- not blindly one-shotting solutions. This means having the patience to understand the complexity of the system, the criticality of its downtime in the overall architecture (in this case it's the k8s controller), ability to learn the codebase, using the right MCPs to delve into all the details needed for testing changes locally etc). These are system-level skills and barely overlaps with just coding skills.
Spoken like someone who has never had to deal with business critical production environments.
It’s like saying that in a post-Viagra world there shouldn’t be men who have trouble getting laid.
Don't want to get too deep into your analogy. I was addressing the "DevOps cannot code" part. To me it is a leadership failure if a DevOps team is still afraid of tackling bigger challenges (like the example given by the OP). That, of course, depends on whether DevOps teams will exist in the long run.
The very fact that we are talking about "DevOps" teams (that do not include dev) is wrong from the very start.
DevOps is a methodology, not a role.
I've always felt that DevOps became a function/team partly because companies and especially SWE's started complaining that they were spending too much time "doing Ops work" and product/business started demanding more features for which they running out of cycles. And add to that the burnout from being on-call (especially if the dev team is relatively small and you have to go on-call every 2-3 weekends).
> most orgs are used to responding to a daytime alert by calling out, “Who just shipped that change?” assuming that whoever merged the diff surely understands how it works and can fix it post-haste. What happens when nobody wrote the code you just deployed, and nobody really understands it?
I assume the first time this happens at any given company will be the moment they realize fully autonomous code changes made on production systems by agents is a terrible idea and every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM.
I think the opposite will happen - leadership will forego this attitude of "reverse course on the first outage".
Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).
Software engineering purists have to get out of some of these religious beliefs
What happens if the person who wrote the code went on vacation? What happens if the code is many years old and no current team member has touched the code?
Understanding code you didn't personally write is part of the job.
If companies were generally capable of that level of awareness they would not operate the way that they do.
I don't understand these graphs. Why do the lines go back in time?
Am I the only one who remembers when DevOps meant "developers are responsible for dealing with the operational part of their software too, so that they don't just throw stuff over the wall for another team to deal with the 3AM pages"?
It seems to have become: "we turned ops into coding too, so now the ops team needs to be good at software engineering"
DevOps was (and is) merely an excuse for companies to replace Developers with cheaper Ops resources, and yet expecting better services and better products from them.
My personal experience says that the best way is that Ops team shouldn not be repurposed as Developers, rather put the experienced Developers into Production Support (incident management, that's intense Ops, working in shifts and weekends, etc.). And rotate them whenever needed. Over a period of time, you'll invariably see less defects and issues percolating down from the Devs, and then after both sides are stable and working well together with less friction and open tickets, then some more tech savvy Ops members can be rotated into Development teams as rookie devs to help reduce costs a bit (as there'll invariably be some natural attrition among the Devs and Ops, so this gives an alternative career path to the Ops team (who are usually less paid, and more stressed), and pushes the Devs not to become complacent). Such an approach is doable and productive.
I am with you.
DevOps is a methodology. DevOps as a role or team name is a fantasy from people who do not understand the methodology.
If you want DevOps to work, your Ops must be member of the development team, take part in the sprints, etc. But many company do not want to do that because they want to separate ops and dev budget/accounting and do not want to hire enough people with ops skills.
Because the idea you can have all aspects of maintaining a complex piece of technology, maintained by a single cross-skilled team of interchangeable cogs, is utopian and unworkable past any reasonable level of scale
DevOps, shift left, full stack dev, all reminds me of the Futurama episode where Hermes Conrad successfully reorgs the slave camp he's sent to, so that all physical labour is done by a single Australian man
Speaking darker, there is a kind of - well, perhaps not misanthropy, but certainly a not-so-well-meaning dismissiveness, to the "silo breaking" philosophy that looks at complex fields and says "well these should all just be lumped together as one thing, the important stuff is simple, I don't know why you're making all these siloes, man" - assuming that ops specialists, sysadmins, programmers, DBAs, frontend devs, mobile devs, data engineers and testers have just invented the breadth and depth and subtleties of their entire fields, only as a way of keeping everybody else out
But modern systems are complex, they are only getting more so, and the further you buy into the shift-left everyone-is-everything computer-jobs-are-all-the-same philosophy the harder and harder it will get to find employees who can straddle the exhausting range of knowledge to master
My message to the CTO of Honeycomb.io (who apparently wrote this post): please avoid getting philosophical and controversial to gin up curiosity about your AI platform. If you want to highlight the benefits of your platform then do so earnestly and objectively. Please don't mask marketing with an excoriation of a profession that has never been well-defined (or has always been defined to fit into an organization's political landscape for the most part). And you guys (like every other SRE/Ops platform) capitalized on that structural divide and deservedly got rich by selling licenses to these teams. I don't think you can come in now with this holier-than-thou best practice messaging just because platforms like yours have zero moat in this post-CC/Codex world.
Hence my vitriol: https://news.ycombinator.com/item?id=46662287.
> id getting philosophical and controversial to gin up curiosity about your AI platform
Also: please could he please avoid doing it by illustrating his non-sense with graphs that are both childish and non-sensical?
The CTO is a she.
If your developers weren't looking at dashboards before, they won't use a chat interface to interrogate it either. That doesn't really bring it to them any more than their existing capabilities. There's also a worrying underlying assumption being made here that the answers your LLM will give you are accurate and trustworthy.
> There's also a worrying underlying assumption being made here that the answers your LLM will give you are accurate and trustworthy.
I first hand saw in, AWS devDays, an AI giving SIWINCH as "root-cause" of Apache error in a containerized process is in EKS for a backend FCGI process connection error. It has been extremely hard since that demo to trust any AI for system level debugging.
If we were smart we'd use AI to grok a system in order to help us reduce its complexity. I don't think we're anywhere close to even being able to provide all the necessary context to solve problems like this.
In my experience DevOps has little interest in doing actual DevOps - they just want to run ops. They want to advise (or tell us we’re holding it wrong) but actually get their hands dirty. On the flip side, devs don’t want to spend a ton of time learning k8s or how to manage servers, cloud services, etc.
DevOps is a mess of our own making - embracing K8s created complexity for little gain for nearly all companies.
"I think the entire DevOps movement was a mighty, ... it failed."
I'm so sick of this nonsense. "Devops" isn't failing, isn't an issue, you can rename it whatever you want, but throughout my career the devops engineers (the ones you don't skimp on) are the best, highest paid professionals at the company.
I don't know why I keep reading these completely crazy think-pieces hemming and hawing about a system (having a few engineers who master performance/backups/deployments/oncall/retros) that seems to be wildly successful. It would be nice if more engineers understood under-the-hood, but most companies choose not to exclusively hire at that caliber.
Scratching neck: come on... just one more vendor, bro
In my company, instead of relying on an ops team.. we rely on a devops team.
I can't wait for indie developers to build super-agents that commoditize providers like Honeycomb.io and more importantly clone all their features and offer them up for free as OSS.
DevOps only works when the developers are always right. What usually happens is the DevOps team thinks they know best (they are developers too, just not the ones using the tools), and they build a lot of garbage that no one wants to use, often making things more complicated than they were before.
Eventually a bureaucrat becomes the manager of the team, and seeks to expand the set of things under DevOps' control. This makes the team a single point of failure for more and more things, while driving more and more developer processes towards mediocrity. Velocity slows, while the DevOps bottlenecks are used as a reason to hire.
It's an organizational problem, not a talent or knowledge problem. Allowing a group to hire and grow within an organization, which is not directly accountable for the success of the other parts of the organization that it was intended to support, is creating a cancer, definitionally.