Google has been using its own TPU silicon for machine learning since 2015.
I think they do all deep learning for Gemini on ther own silicon.
But they also invented AI as we know it when they introduced transformer architecture and they’ve been more invested in machine learning than most companies for a very long time.
The first revisions were stuff made by qualcomm right? I don't think we have much data on how much customizations they make and where they their IP from, but given how much of the Tensor cores comes from Samsung I think it's safe to say to assume that there is a decent amount coming from some of the big vendors.
It's "made by" TSMC as usual. Their customization comes from identifying which compute operations that want optimized in hardware and do it themselves. And then they buy non-compute IP like HBM from Broadcom. And Broadcom also does things like physical design.
Apple generally tries to erase info about acquisitions from their official company story, they want it to look like internal Apple innovation.
When it comes to CPUs they bought P.A. Semi back in 2008 and got a lot of smart people with decades of relevant experience that were doing cutting-edge stuff at the time.
This was immensely important to be able to deliver current Apple CPUs.
The first tpu they made was inference only. Everything since has been used for training. I think that means they weren't using it for training in 2015 but rather 2017 based on Wikipedia.
I'm 99.999% sure that the claim of "all deep learning for Gemini on their own silicon" is not true.
Maybe if you restrict it similarly to the Deepseek paper to "Gemini uses TPU for the final successful training run and for scaled inference" you might be correct, but there's no way that GPUs aren't involved for at minimum comparability and more rapid iteration reasons during the extremely buggy and error prone point of getting to the final training run. Certainly the theoretical and algorithmic innovations that are often being done at Google and do make their way into Gemini also sometimes using Nvidia GPUs.
GCP has a lot of, likely on the order of at least 1 million GPUs in their fleet today (I'm likely underestimating). Some of that is used internally and is made available to their engineering staff. What constitutes "deep learning for gemini" is very up to interpretation.
That's a strange position to take with such high certainty. Google has been talking about training on TPUs for a long time. Many ex and current employees have been on the record talking about how much nicer the Google internal training infra using TPUs is. GPU is an afterthought for Google's inference and non-existent in training.
Internally, TPU is much cheaper for the same amount of compute compared to GPU, so I don't see much reasons why they need to use GPU. Probably >99% of compute budgets are spent on TPU. It might be true if you say these <1% still counts, but I guess it is pretty safe to say all of its meaningful production workload are running on TPU. It is simply too expensive to run a meaningful amount of compute on non-TPU.
Just to clarify, TPU has been in development for a decade and it is quite mature these days. Years ago internal consumers had to accept the CPU/GPU and TPU duality but I think this case is getting rarer. I guess this is even more true for DeepMind since itself owns a ML infra team. They likely be able to fix most of the issues with a high priority.
You seem to think GPUs are better than TPUs for rapid iteration. Why is that? There's no inherent reason why one is more suited to rapid iteration than another; it's entirely a matter of developer tooling and infrastructure. And Google famously has excellent tooling. And furthermore, the tooling Google exposes to the outside world is usually poorer than the tooling used internally by Googlers.
It was a few years ago since I last played with Googles hw, but iirc TPUs were inflexible and very fast. Worked well for linear and convolutional layers but could not accelerate certain LSTM configurations. For such networks GPUs were faster. It wouldn't surprise me the least if TPU hardware support lagged behind what the latest and greatest LLMs require for training.
Google is the creator of JAX and XLA. Maybe the developer laptops have Nvidia GPUs and they do some testing there, but for Google there is literally no point in bothering with CUDA, pytorch or any other ecosystem strongly focused on Nvidia GPUs.
In my experience JAX is way more flexible than pytorch the moment you want to do things that aren't training ML models. E.g. you want to build an optimizer that uses the derivative of your model with respect to the input.
Honestly Pytorch is weird imo, I'm surprised people love it so much.
Loss.backward? Tensor.grad? Optimizer.zero grad()? With torch.no_grad()?
What is with all these objects holding pointers to stuff? An ndarray is a pointer to memory and a shape my dudes. A gradient is the change in a scalar function w.r.t to some inputs.
I guess Microsoft’s investment into Graphcore didn’t pay off. Not sure what they’re planning but more of that isn’t going to cut it. At the time (late 2019) I was arguing for either a GPU approach or specialized architecture targeting transformers.
There was a split at MS where the ‘Next Gen’ bayesian was being done in the US and the frequentist work was being shipped off to China. Chris Bishop was promoted to head of MSR Cambridge which didn’t help.
Microsoft really is an institutionally stupid organization so I have no idea on which direction they actually go. My best guess is that it’s all talk.
Microsoft lacks the credibility and track record for this to be anything but talk. Hardware doesn’t simply go from zero to gigawatts of infrastructure on talk. Even Apple is better positioned for such a thing.
They do have such a dedicated chip, the MAIA 100 chip which is an in-house chip, and it is a chip that was designed in the era of transformers, and this is what is being discussed in the interview.
I missed that, it’s been a few years since I’ve paid attention to MS hardware and it is very possible that my thoughts are out of date. I left MS with a rather bad taste in my mouth. I’m checking out the info on that chip and what I am seeing is a little light on details. Just TPUs and fast interconnects.
What I’ve found; MIAI 200 the next version is having issues due to brain drain, and MIAI 300 is to be an entirely new architecture so the status for that is rather uncertain.
I think a big reason MS invested so heavily into OpenAI was to have a marquee customer push cultural change through the org, which was a necessary decision. If that eventually yields in a useful chip I will be impressed, I hope it does.
I don't like this trend where all the big tech companies are bringing hardware in-house, because it makes it unavailable to everyone else. I'd rather not have it be the case that everyone who isn't big tech either pays the Nvidia tax or deals with comparatively worse hardware, while each big tech has their own. If these big tech companies also sold their in-house chips, then that would address this problem. I like what Ampere is doing in this space.
On a slightly different tangent, is anyone working on analog machine learning ASICs? Sub-threshold CMOS or something? I mean even at the research level? Using a handful of transistor for an analog multiplier. And get all of the crazy fascinating translinear stuff of Barrie Gilbert fame.
Not suprising that the hyperscalers will make this decision for inference and maybe even a large chunk of training. I wonder if it will spur nvidia to work on an inference only accelerator.
> I wonder if it will spur nvidia to work on an inference only accelerator.
Arguably that's a GPU? Other than (currently) exotic ways to run LLMs like photonics or giant SRAM tiles there isn't a device that's better at inference than GPUs and they have the benefit that they can be used for training as well. You need the same amount of memory and the same ability to do math as fast as possible whether its inference or training.
"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."
Yeah, I'm probably splitting hairs here but as far as I understand (and honestly maybe I don't understand) - Rubin CPX is "just" a normal GPU with GDDR instead of HBM.
In fact - I'd say we're looking at this backwards - GPUs used to be the thing that did math fast and put the result into a buffer where something else could draw it to a screen. Now a "GPU" is still a thing that does math fast, but now sometimes, you don't include the hardware to put the pixels on a screen.
So maybe - CPX is "just" a GPU but with more generic naming that aligns with its use cases.
There are some inference chips that are fundamentally different from GPUs. For example, one of the guys who designed Google's original TPU left and started a company (with some other engineers) called groq ai (not to be confused with grok ai). They make a chip that is quite different from a GPU and provides several advantages for inference over traditional GPUs:
They’re already optimizing GPU die area for LLM inference over other pursuits: the FP64 units in the latest Blackwell GPUs were greatly reduced and FP4 was added
I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.
> The software titan is rather late to the custom silicon party. While Amazon and Google have been building custom CPUs and AI accelerators for years, Microsoft only revealed its Maia AI accelerators in late 2023.
They are too late for now, they realistically hardware takes a couple generations to become a serious contender and by the time Microsoft has a chance to learn from their hardware mistakes the “AI” bubble will have popped.
But, there will probably be some little LLM tools that do end up having practical value; maybe there will be a happy line-crossing point for MS and they’ll have cheap in-house compute when the models actually need to be able to turn a profit.
At this point it will take a lot of investment to catch up. Google relies heavily on specialized interconnects to build massive tpu clusters. It's more than just designing a chip these days. Folks who work on interconnects are a lot more rare than engineers who can design chips.
Most of the big players started working on hardware for this stuff in 2018/2019. I worked at MSFT silicon org during this time. Meta was also hiring my coworkers for similar projects. I left a few years ago and don’t know current state but they already have some generations under their belt
> hardware takes a couple generations to become a serious contender
Not really and for the same reason Chinese players like Biren are leapfrogging - much of the workload profile in AI/ML is "embarrassingly parallel", thus reducing the need for individual ASICs to be bleeding edge performant.
If you are able to negotiate competitive fabrication and energy supply deals, you can mass produce your way into providing "good enough" performance.
Finally, the persona who cares about hardware performance in training isn't in the market for cloud offered services.
As I understood it the main bottleneck is interconnects, anyhow. It's more difficult to keep the ALUs fed than it is to make them fast enough, especially once your model can't fit in one die/PCB. And that's in principle a much trickier part of the design, so I don't really know how that shakes out (is there a good enough design that you can just buy as a block?)
And current LLM architectures affinitize differently to HW than DNNs even just a decade ago. If you have the money and technical expertise (both of which I assume MS has access to) then a late start might actually be beneficial.
The name of the game has been custom SoCs and ASICs for a couple years now, because inference and model training is an "embarrassingly parallel" problem, and models that are optimized for older hardware can provide similar gains to models that are run on unoptimized but more performant hardware.
Same reason H100s remain a mainstay in the industry today, as their performance profile is well understood now.
> The name of the game has been custom SoCs and ASICs for a couple years now, because inference and model training is an "embarrassingly parallel" problem, and models that are optimized for older hardware can provide similar gains to models that are run on unoptimized but more performant hardware.
One thing to point out - "ASIC" is more of a business term than a technical term.
The teams that work on custom ASIC design at (eg.) Broadcom for Microsoft are basically designing custom GPUs for MS, but these will only meet the requirements that Microsoft lays out, and Microsoft would have full insight and visibility into the entire architecture.
When Microsoft talks about “making their own CPUs,” they just mean putting together a large number of off-the-shelf Arm Neoverse cores into their own SoC, not designing a fully custom CPU. This is the same thing that Google and Amazon are doing as well.
Even just saying this applies downward pressure on pricing: NVIDIA has an enormous amount of market power (~"excess" profit) right now and there aren't enough near competitors to drive that down. The only thing that will work is their biggest _consumers_ investing, or threatening to invest, if their prices are too high.
Long term, I wonder if we're exiting the "platform compute" era, for want of a better term. By that I mean compute which can run more or less any operating system, software, etc. If everyone is siloed into their own vertically integrated hardware+operating system stack, the results will be awful for free software.
In that case, it's great that Microsoft is building their silicon. Keeps NVIDIA in check, otherwise these profits would evaporate into nonsense and NVIDIA would lose the AI industry to competition from China. Which, depending if AGI/ASI is possible or not, may or may not be a great move.
It always falls back on the software. AMD is behind, not because the hardware is bad, but because their software historically has played second fiddle to their hardware. The CUDA moat is real.
So, unless they also solve that issue with their own hardware, then it will be like the TPU, which is limited to usage primarily at Google, or within very specific use cases.
There are only so many super talented software engineers to go around. If you're going to become an expert in something, you're going to pick what everyone else is using first.
I don't know. The transformer architecture uses only a limited number of primitives. Once you have ported those to your new architecture, you're good to go.
Also, Google has been using TPUs for a long time now, and __they__ never hit a brick wall for a lack of CUDA.
Very few developers outside of Google have ever written code for a TPU. In a similar way, far fewer have written code for AMD, compared to NVIDIA.
If you're going to design a custom chip and deploy it in your data centers, you're also committing to hiring and training developers to build for it.
That's a kind of moat, but with private chips. While you solve one problem (getting the compute you want), you create another: supporting and maintaining that ecosystem long term.
NVIDIA was successful because they got their hardware into developers hands, which created a feedback loop, developers asked for fixes/features, NVIDIA built them, the software stack improved, and the hardware evolved alongside it. That developer flywheel is what made CUDA dominant and is extremely hard to replicate because the shortage of talented developers is real.
I mean it's all true to some extent. But that doesn't mean implementing the few primitives to get transformers running requires CUDA, or that it's an impossible task. Remember, we're talking about >$1B companies here who can easily assemble teams of 10s-100s of developers.
You can compare CUDA to the first PC OS, DOS 1.0. Sure, DOS was viewed as a moat at the time, but it didn't keep others from kicking its ass.
> You can compare CUDA to the first PC OS, DOS 1.0.
Sorry, I don't understand this comparison at all. CUDA isn't some first version of an OS, not even close. It's been developed for almost 20 years now. Bucketloads of documentation, software and utility have been created around it. It won't have its ass kicked by any stretch of imagination.
Yes, CUDA has a history. And it shows. CUDA has very bad integration with the OS for example. It's time some other company (Microsoft sounds like a good contender) showed them how you do this the right way.
Anyway, this all distracts from the fact that you don't need an entire "OS" just to run some arithmetic primitives to get transformers running.
> CUDA has very bad integration with the OS for example.
If you want to cherry pick anything, you can. But in my eyes, you're just solidifying my point. Software is critical. Minimizing the surface is obviously a good thing (tinygrad for example), but you're still going to need people who are willing and able to write the code.
The CUDA moat is real for general purpose computing and for researchers that want a swiss army knife, but when it comes to well known deployments, for either training or inference, the amount of stuff that you need from a chip is quite limited.
You do not need most of CUDA, or most of the GPU functionality, so dedicated chips make sense. It was great to see this theory put to the test in the original llama.cpp stack which showed just what you needed, the tiny llama.c that really shows how little was actually needed and more recently how a small team of engineers at Apple put together MLX.
Absolutely agreed on the need for just specific parts of the chip and tailoring to that. My point is bigger than that. Even if you build a specific chip, you still need engineers who understand the full picture.
Internal ASICs are a completely different market. You know your workloads and there is a finite number of them. It's as if you had to build a web browser, normally an impossible task, except it only needs to work with your company website, which only uses 1% of all of the features a browser offers.
Guess MSFT needs somewhere else AI adjacent to funnel money into to produce the illusion of growth and future cash flow in this bubblified environment.
> produce the illusion of growth and future cash flow in this bubblified environment.
I was ranting about this to my friends; Wallstreet is now banking on Tech firms to produce the illusion of growth and returns, rather than repackaging and selling subprime mortgages.
The tech sector seems to have a never ending supply of things to spur investment and growth: cloud computing, saas, mobile, social media, IoT, crypto, Metaverse, and now AI.
Some useful, some not so much.
Tech firms have a lot of pressure to produce growth, it's filled with very smart people, and wields influence on public policy. The flip side is the mortage crisis, at least before it collapsed, got more Americans into home ownership (even if they weren't ready for it). I'm not sure the tech sectors meteoric rise has been as helpful (sentiment of locals in US tech hubs suggests a overall feeling of dissatisfaction with tech)
So similar to Apple Silicon. If this means they'll be on par with Apple Silicon I'm okay with this, I'm surprised they didn't do this sooner for their Surface devices.
Oh right, for their data centers. I could see this being useful there too, brings costs down lower.
Yes, in the sense that this is at least partially inspired by Apple's vertical integration playbook, which has now been extended to their own data centers based on custom Apple Silicon¹ and a built-for-purpose, hardened edition of Darwin².
Vertical integration only works if your internal teams can stay in the race at each level well enough to keep the stack competitive as a whole. Microsoft can’t attract the same level of talent as Apple because their pay is close to the industry median
Yeah, its interesting, years ago I never thought Apple nor Microsoft would do this, but also Google has done this on their cloud as well, so it makes sense.
Google has been using its own TPU silicon for machine learning since 2015.
I think they do all deep learning for Gemini on ther own silicon.
But they also invented AI as we know it when they introduced transformer architecture and they’ve been more invested in machine learning than most companies for a very long time.
Not that it matters, but Microsoft has been doing AI accelerators for a bit too - project Brainwave has been around since 2018 - https://blogs.microsoft.com/ai/build-2018-project-brainwave/
Yeah I worked in the hardware org around this time. We got moved from under Xbox org to azure and our main work became AI related accelerators
Very cool! Catapult/Brainwave is what got me into hardware & ML stuff :)
The first revisions were stuff made by qualcomm right? I don't think we have much data on how much customizations they make and where they their IP from, but given how much of the Tensor cores comes from Samsung I think it's safe to say to assume that there is a decent amount coming from some of the big vendors.
For TPUs I believe it is Broadcom: https://www.theregister.com/2023/09/22/google_broadcom_tpus/
Not sure about the mobile SoCs
It's "made by" TSMC as usual. Their customization comes from identifying which compute operations that want optimized in hardware and do it themselves. And then they buy non-compute IP like HBM from Broadcom. And Broadcom also does things like physical design.
To be fair that’s a pretty good approach if you look at Apple’s progression from assembled IPs in the first iPhone CPU to the A and M series.
Apple generally tries to erase info about acquisitions from their official company story, they want it to look like internal Apple innovation.
When it comes to CPUs they bought P.A. Semi back in 2008 and got a lot of smart people with decades of relevant experience that were doing cutting-edge stuff at the time.
This was immensely important to be able to deliver current Apple CPUs.
Yeah but at least when it comes to mobile CPUs Apple seemed vastly more competent in how they approached it.
I thought they use GPU for learning and TPU for inference, I’m open to been corrected.
The first tpu they made was inference only. Everything since has been used for training. I think that means they weren't using it for training in 2015 but rather 2017 based on Wikipedia.
The first TPU they *announced" was for inference
Some details here: https://news.ycombinator.com/item?id=42392310
no. for internal training most work is done on TPUs, which have been explicitly designed for high performance training.
I've heard its a mixture because they can't source enough in-house compute
I'm 99.999% sure that the claim of "all deep learning for Gemini on their own silicon" is not true.
Maybe if you restrict it similarly to the Deepseek paper to "Gemini uses TPU for the final successful training run and for scaled inference" you might be correct, but there's no way that GPUs aren't involved for at minimum comparability and more rapid iteration reasons during the extremely buggy and error prone point of getting to the final training run. Certainly the theoretical and algorithmic innovations that are often being done at Google and do make their way into Gemini also sometimes using Nvidia GPUs.
GCP has a lot of, likely on the order of at least 1 million GPUs in their fleet today (I'm likely underestimating). Some of that is used internally and is made available to their engineering staff. What constitutes "deep learning for gemini" is very up to interpretation.
That's a strange position to take with such high certainty. Google has been talking about training on TPUs for a long time. Many ex and current employees have been on the record talking about how much nicer the Google internal training infra using TPUs is. GPU is an afterthought for Google's inference and non-existent in training.
Internally, TPU is much cheaper for the same amount of compute compared to GPU, so I don't see much reasons why they need to use GPU. Probably >99% of compute budgets are spent on TPU. It might be true if you say these <1% still counts, but I guess it is pretty safe to say all of its meaningful production workload are running on TPU. It is simply too expensive to run a meaningful amount of compute on non-TPU.
Just to clarify, TPU has been in development for a decade and it is quite mature these days. Years ago internal consumers had to accept the CPU/GPU and TPU duality but I think this case is getting rarer. I guess this is even more true for DeepMind since itself owns a ML infra team. They likely be able to fix most of the issues with a high priority.
You seem to think GPUs are better than TPUs for rapid iteration. Why is that? There's no inherent reason why one is more suited to rapid iteration than another; it's entirely a matter of developer tooling and infrastructure. And Google famously has excellent tooling. And furthermore, the tooling Google exposes to the outside world is usually poorer than the tooling used internally by Googlers.
It was a few years ago since I last played with Googles hw, but iirc TPUs were inflexible and very fast. Worked well for linear and convolutional layers but could not accelerate certain LSTM configurations. For such networks GPUs were faster. It wouldn't surprise me the least if TPU hardware support lagged behind what the latest and greatest LLMs require for training.
Google is the creator of JAX and XLA. Maybe the developer laptops have Nvidia GPUs and they do some testing there, but for Google there is literally no point in bothering with CUDA, pytorch or any other ecosystem strongly focused on Nvidia GPUs.
In my experience JAX is way more flexible than pytorch the moment you want to do things that aren't training ML models. E.g. you want to build an optimizer that uses the derivative of your model with respect to the input.
Honestly Pytorch is weird imo, I'm surprised people love it so much.
Loss.backward? Tensor.grad? Optimizer.zero grad()? With torch.no_grad()?
What is with all these objects holding pointers to stuff? An ndarray is a pointer to memory and a shape my dudes. A gradient is the change in a scalar function w.r.t to some inputs.
I guess Microsoft’s investment into Graphcore didn’t pay off. Not sure what they’re planning but more of that isn’t going to cut it. At the time (late 2019) I was arguing for either a GPU approach or specialized architecture targeting transformers.
There was a split at MS where the ‘Next Gen’ bayesian was being done in the US and the frequentist work was being shipped off to China. Chris Bishop was promoted to head of MSR Cambridge which didn’t help.
Microsoft really is an institutionally stupid organization so I have no idea on which direction they actually go. My best guess is that it’s all talk.
Microsoft lacks the credibility and track record for this to be anything but talk. Hardware doesn’t simply go from zero to gigawatts of infrastructure on talk. Even Apple is better positioned for such a thing.
Microsoft has plenty of home grown hardware on Azure, some of which even has firmware written in Rust nowadays.
Yep, tons of decent homegrown stuff in Azure. And for a long time - see the Catapult FPGA cards.
They do have such a dedicated chip, the MAIA 100 chip which is an in-house chip, and it is a chip that was designed in the era of transformers, and this is what is being discussed in the interview.
I missed that, it’s been a few years since I’ve paid attention to MS hardware and it is very possible that my thoughts are out of date. I left MS with a rather bad taste in my mouth. I’m checking out the info on that chip and what I am seeing is a little light on details. Just TPUs and fast interconnects.
What I’ve found; MIAI 200 the next version is having issues due to brain drain, and MIAI 300 is to be an entirely new architecture so the status for that is rather uncertain.
I think a big reason MS invested so heavily into OpenAI was to have a marquee customer push cultural change through the org, which was a necessary decision. If that eventually yields in a useful chip I will be impressed, I hope it does.
I don't like this trend where all the big tech companies are bringing hardware in-house, because it makes it unavailable to everyone else. I'd rather not have it be the case that everyone who isn't big tech either pays the Nvidia tax or deals with comparatively worse hardware, while each big tech has their own. If these big tech companies also sold their in-house chips, then that would address this problem. I like what Ampere is doing in this space.
On a slightly different tangent, is anyone working on analog machine learning ASICs? Sub-threshold CMOS or something? I mean even at the research level? Using a handful of transistor for an analog multiplier. And get all of the crazy fascinating translinear stuff of Barrie Gilbert fame.
https://www.electronicdesign.com/technologies/analog/article...
https://www.analog.com/en/resources/analog-dialogue/articles...
http://madvlsi.olin.edu/bminch/talks/090402_atact.pdf
For large models, the bottlenecks are memory bandwidth, network, and power consumption by the DAC/ADC arrays
It’s never come even close to penciling out in practice.
For small models there are people working on this implemented in flash memory eg Mythic.
A bunch of people. Just type these terms into DuckDuckGo:
analog neural network hardware
physical neural network hardware
Put "this paper" after each one to get academic research. Try it with and without that phrase. Also, add "survey" to the next iteration.
The papers that pop up will have the internal jargon the researchers use to describe their work. You can further search with it.
The "this paper," "survey," and internal jargon in various combinations are how I find most CompSci things I share.
Thanks for these helpful search terms!
For those who don't know Msft is working on https://azure.microsoft.com/en-us/blog/azure-maia-for-the-er...
Weird use of "homemade"! I guess they mean "in-house"?
Just like how mama used to make 'em!
It's a delightful coincidence of history that the "What's Jensen been cooking" pandemic gag happened on the generation that would wake up the AIs.
https://www.youtube.com/watch?v=So7TNRhIYJ8
Moms secret ingredient to her AI was Nvidia!
Made where? Isn’t foundry capacity the limiting factor on chips for AI right now?
TSMC manufactures the MAIA 100:
https://azure.microsoft.com/en-us/blog/azure-maia-for-the-er...
By cutting the middle man out, MS could pay TSMC more than nvidia per wafer and still save money.
[delayed]
This is the whole game right here.
Not suprising that the hyperscalers will make this decision for inference and maybe even a large chunk of training. I wonder if it will spur nvidia to work on an inference only accelerator.
> I wonder if it will spur nvidia to work on an inference only accelerator.
Arguably that's a GPU? Other than (currently) exotic ways to run LLMs like photonics or giant SRAM tiles there isn't a device that's better at inference than GPUs and they have the benefit that they can be used for training as well. You need the same amount of memory and the same ability to do math as fast as possible whether its inference or training.
> Arguably that's a GPU?
Yes, and to @quadrature's point, NVIDIA is creating GPUs explicitly focused on inference, like the Rubin CPX: https://www.tomshardware.com/pc-components/gpus/nvidias-new-...
"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."
Yeah, I'm probably splitting hairs here but as far as I understand (and honestly maybe I don't understand) - Rubin CPX is "just" a normal GPU with GDDR instead of HBM.
In fact - I'd say we're looking at this backwards - GPUs used to be the thing that did math fast and put the result into a buffer where something else could draw it to a screen. Now a "GPU" is still a thing that does math fast, but now sometimes, you don't include the hardware to put the pixels on a screen.
So maybe - CPX is "just" a GPU but with more generic naming that aligns with its use cases.
There are some inference chips that are fundamentally different from GPUs. For example, one of the guys who designed Google's original TPU left and started a company (with some other engineers) called groq ai (not to be confused with grok ai). They make a chip that is quite different from a GPU and provides several advantages for inference over traditional GPUs:
https://www.cdotrends.com/story/3823/groq-ai-chip-delivers-b...
I would submit Google's TPUs are not GPUs.
Similarly, Tenstorrent seems to be building something that you could consider "better", at least insofar that the goal is to be open.
They’re already optimizing GPU die area for LLM inference over other pursuits: the FP64 units in the latest Blackwell GPUs were greatly reduced and FP4 was added
I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.
The AMD NPU has more than 2x the performance per watt versus basically any Nvidia GPU. Nvidia isn't leading because they are power efficient.
And no, the NPU isn't a GPU.
For many years, every few months Microsoft and Meta say they are going to do AI hardware. But nothing tangible is delivered.
Yeah they could just be negotiating a better deal w nvidia.
"Microsoft Silicon", coming up.
Is it practical for them to buy an existing chip maker? Or would they just go home-grown?
- As of today, Nvidia's market cap is a whopping 4.51 trillion USD compared to Microsoft's 3.85 trillion USD, so that might not work.
- AMD's market cap is 266.49 billion USD, which is more in reach.
Intel is almost a bargain at $175 billion
The most important note is:
> The software titan is rather late to the custom silicon party. While Amazon and Google have been building custom CPUs and AI accelerators for years, Microsoft only revealed its Maia AI accelerators in late 2023.
They are too late for now, they realistically hardware takes a couple generations to become a serious contender and by the time Microsoft has a chance to learn from their hardware mistakes the “AI” bubble will have popped.
But, there will probably be some little LLM tools that do end up having practical value; maybe there will be a happy line-crossing point for MS and they’ll have cheap in-house compute when the models actually need to be able to turn a profit.
At this point it will take a lot of investment to catch up. Google relies heavily on specialized interconnects to build massive tpu clusters. It's more than just designing a chip these days. Folks who work on interconnects are a lot more rare than engineers who can design chips.
Most of the big players started working on hardware for this stuff in 2018/2019. I worked at MSFT silicon org during this time. Meta was also hiring my coworkers for similar projects. I left a few years ago and don’t know current state but they already have some generations under their belt
> hardware takes a couple generations to become a serious contender
Not really and for the same reason Chinese players like Biren are leapfrogging - much of the workload profile in AI/ML is "embarrassingly parallel", thus reducing the need for individual ASICs to be bleeding edge performant.
If you are able to negotiate competitive fabrication and energy supply deals, you can mass produce your way into providing "good enough" performance.
Finally, the persona who cares about hardware performance in training isn't in the market for cloud offered services.
As I understood it the main bottleneck is interconnects, anyhow. It's more difficult to keep the ALUs fed than it is to make them fast enough, especially once your model can't fit in one die/PCB. And that's in principle a much trickier part of the design, so I don't really know how that shakes out (is there a good enough design that you can just buy as a block?)
The bet I'm seeing is to try and invest in custom ASICs to become integrated as part of an SoC to solve that interconnect bottleneck.
It's largely a solved problem based on Google/Broadcom's TPU work - almost everyone is working with Broadcom to design their own custom ASIC and SoC.
And current LLM architectures affinitize differently to HW than DNNs even just a decade ago. If you have the money and technical expertise (both of which I assume MS has access to) then a late start might actually be beneficial.
Doesn't come as a surprise, I imagine they would also build on top of Direct Compute, or something else they can think of.
Honestly this would be great for competition. Would love to see them impish in that direction.
I've mentioned this before on HN [0][1].
The name of the game has been custom SoCs and ASICs for a couple years now, because inference and model training is an "embarrassingly parallel" problem, and models that are optimized for older hardware can provide similar gains to models that are run on unoptimized but more performant hardware.
Same reason H100s remain a mainstay in the industry today, as their performance profile is well understood now.
[0] - https://news.ycombinator.com/item?id=45275413
[1] - https://news.ycombinator.com/item?id=43383418
> The name of the game has been custom SoCs and ASICs for a couple years now, because inference and model training is an "embarrassingly parallel" problem, and models that are optimized for older hardware can provide similar gains to models that are run on unoptimized but more performant hardware.
Is anyone else getting crypto flashbacks?
The difference is crypto wasn't memory and throughput dependent. That's why a small asic on a USB stick could outperform a GPU.
One thing to point out - "ASIC" is more of a business term than a technical term.
The teams that work on custom ASIC design at (eg.) Broadcom for Microsoft are basically designing custom GPUs for MS, but these will only meet the requirements that Microsoft lays out, and Microsoft would have full insight and visibility into the entire architecture.
For GPUs at least this is pretty obvious. For CPUs it is less clear to me that they can do it more efficiently.
When Microsoft talks about “making their own CPUs,” they just mean putting together a large number of off-the-shelf Arm Neoverse cores into their own SoC, not designing a fully custom CPU. This is the same thing that Google and Amazon are doing as well.
Even just saying this applies downward pressure on pricing: NVIDIA has an enormous amount of market power (~"excess" profit) right now and there aren't enough near competitors to drive that down. The only thing that will work is their biggest _consumers_ investing, or threatening to invest, if their prices are too high.
Long term, I wonder if we're exiting the "platform compute" era, for want of a better term. By that I mean compute which can run more or less any operating system, software, etc. If everyone is siloed into their own vertically integrated hardware+operating system stack, the results will be awful for free software.
In that case, it's great that Microsoft is building their silicon. Keeps NVIDIA in check, otherwise these profits would evaporate into nonsense and NVIDIA would lose the AI industry to competition from China. Which, depending if AGI/ASI is possible or not, may or may not be a great move.
well yeah, I can't imagine sending all your shareholders money to nvidia to produce slop no-one is willing to pay for is going down too well
It always falls back on the software. AMD is behind, not because the hardware is bad, but because their software historically has played second fiddle to their hardware. The CUDA moat is real.
So, unless they also solve that issue with their own hardware, then it will be like the TPU, which is limited to usage primarily at Google, or within very specific use cases.
There are only so many super talented software engineers to go around. If you're going to become an expert in something, you're going to pick what everyone else is using first.
> The CUDA moat is real.
I don't know. The transformer architecture uses only a limited number of primitives. Once you have ported those to your new architecture, you're good to go.
Also, Google has been using TPUs for a long time now, and __they__ never hit a brick wall for a lack of CUDA.
It is beyond porting, it is mentality of developers. Change is expensive and I'm not just talking about $ value.
> Also, Google has been using TPUs for a long time now, and __they__ never hit a brick wall for a lack of CUDA.
That's exactly what I'm saying. __they__ is the keyword.
Not sure what you mean. Google is a big company. Their TPUs have many users internally.
Very few developers outside of Google have ever written code for a TPU. In a similar way, far fewer have written code for AMD, compared to NVIDIA.
If you're going to design a custom chip and deploy it in your data centers, you're also committing to hiring and training developers to build for it.
That's a kind of moat, but with private chips. While you solve one problem (getting the compute you want), you create another: supporting and maintaining that ecosystem long term.
NVIDIA was successful because they got their hardware into developers hands, which created a feedback loop, developers asked for fixes/features, NVIDIA built them, the software stack improved, and the hardware evolved alongside it. That developer flywheel is what made CUDA dominant and is extremely hard to replicate because the shortage of talented developers is real.
I mean it's all true to some extent. But that doesn't mean implementing the few primitives to get transformers running requires CUDA, or that it's an impossible task. Remember, we're talking about >$1B companies here who can easily assemble teams of 10s-100s of developers.
You can compare CUDA to the first PC OS, DOS 1.0. Sure, DOS was viewed as a moat at the time, but it didn't keep others from kicking its ass.
> You can compare CUDA to the first PC OS, DOS 1.0.
Sorry, I don't understand this comparison at all. CUDA isn't some first version of an OS, not even close. It's been developed for almost 20 years now. Bucketloads of documentation, software and utility have been created around it. It won't have its ass kicked by any stretch of imagination.
Yes, CUDA has a history. And it shows. CUDA has very bad integration with the OS for example. It's time some other company (Microsoft sounds like a good contender) showed them how you do this the right way.
Anyway, this all distracts from the fact that you don't need an entire "OS" just to run some arithmetic primitives to get transformers running.
> CUDA has very bad integration with the OS for example.
If you want to cherry pick anything, you can. But in my eyes, you're just solidifying my point. Software is critical. Minimizing the surface is obviously a good thing (tinygrad for example), but you're still going to need people who are willing and able to write the code.
OK, but Microsoft is a software company ...
The CUDA moat is real for general purpose computing and for researchers that want a swiss army knife, but when it comes to well known deployments, for either training or inference, the amount of stuff that you need from a chip is quite limited.
You do not need most of CUDA, or most of the GPU functionality, so dedicated chips make sense. It was great to see this theory put to the test in the original llama.cpp stack which showed just what you needed, the tiny llama.c that really shows how little was actually needed and more recently how a small team of engineers at Apple put together MLX.
Absolutely agreed on the need for just specific parts of the chip and tailoring to that. My point is bigger than that. Even if you build a specific chip, you still need engineers who understand the full picture.
Internal ASICs are a completely different market. You know your workloads and there is a finite number of them. It's as if you had to build a web browser, normally an impossible task, except it only needs to work with your company website, which only uses 1% of all of the features a browser offers.
Very true indeed. I'm not arguing against that at all.
Url changed from https://www.theregister.com/2025/10/02/microsoft_maia_dc/, which points to this.
Submitters: "Please submit the original source. If a post reports on something found on another site, submit the latter." - https://news.ycombinator.com/newsguidelines.html
somebody wants to buy NVDA cheaper... ;)
plus someone has no leverage whatsoever other than talking
Microsoft, famously resource-poor.
They have so much money it is harmful to their ability to execute.
Just look at the implosion of the XBox business.
Granted, if everyone had done what the highly paid executives had told them to do, xbox would never have existed.
And I'm guessing that the decline is due to executive meddling.
What is it that executives do again? Beyond collecting many millions of dollars a year, that is.
They sit around fantasizing about buying Nintendo because that would be the top thing they could achieve in their careers.
Microsoft just can't stop following apple's lead.
better late than never to get into to game... right? right....?
Just like mobile!
The current M$ sure is doing a great job at making people move to alternatives.
People, sure, but that's not their target demographic. It's businesses and they aren't moving away from MS anytime soon.
homemade chips is probably a lot of fun but buying regular Lay's is so much easier
Guess MSFT needs somewhere else AI adjacent to funnel money into to produce the illusion of growth and future cash flow in this bubblified environment.
> produce the illusion of growth and future cash flow in this bubblified environment.
I was ranting about this to my friends; Wallstreet is now banking on Tech firms to produce the illusion of growth and returns, rather than repackaging and selling subprime mortgages.
The tech sector seems to have a never ending supply of things to spur investment and growth: cloud computing, saas, mobile, social media, IoT, crypto, Metaverse, and now AI.
Some useful, some not so much.
Tech firms have a lot of pressure to produce growth, it's filled with very smart people, and wields influence on public policy. The flip side is the mortage crisis, at least before it collapsed, got more Americans into home ownership (even if they weren't ready for it). I'm not sure the tech sectors meteoric rise has been as helpful (sentiment of locals in US tech hubs suggests a overall feeling of dissatisfaction with tech)
So similar to Apple Silicon. If this means they'll be on par with Apple Silicon I'm okay with this, I'm surprised they didn't do this sooner for their Surface devices.
Oh right, for their data centers. I could see this being useful there too, brings costs down lower.
> So similar to Apple Silicon.
Yes, in the sense that this is at least partially inspired by Apple's vertical integration playbook, which has now been extended to their own data centers based on custom Apple Silicon¹ and a built-for-purpose, hardened edition of Darwin².
¹ https://security.apple.com/blog/private-cloud-compute/ ² https://en.wikipedia.org/wiki/Darwin_(operating_system)
Vertical integration only works if your internal teams can stay in the race at each level well enough to keep the stack competitive as a whole. Microsoft can’t attract the same level of talent as Apple because their pay is close to the industry median
Yeah, its interesting, years ago I never thought Apple nor Microsoft would do this, but also Google has done this on their cloud as well, so it makes sense.