I appreciate that there are people in academia working on this problem, but it seems like something AMD would have to fix internally if they were serious.
Except that this same team built a similarly named software package for Nvidia GPUs as well. It’s bright researchers doing what they do best if you ask me.
From this writeup it does sound like the architecture of the AMD gpu makes it a bit harder to optimize. It also seems like long term, the AMD approach may scale better in the long run. 8 chiplets rather than 2 for the nvidia offering, along with all the associated cache and memory locality woes.
The future will probably see more chiplets rather than less, so I wonder if dealing with complexity here will pay more dividends in the long run
I think many people tried making AMD GPU go brrr for the mass of the developers but no one succeeded.
I don't get why AMD doesn't solve their own software issues. Now they have a lot of money so not having money to pay for developers is not an excuse.
And data centers GPUs are not the worst. Using GPU compute for things like running inference at home is a much, much better experience with Nvidia. My 5 years old RTX 3090 is better than any consumer GPU AMD released up to this date, at least for experimenting with ML and AI.
I recently switched from an NVidia card (5090) to a couple of AMD cards (R9700 32GB) for my inference server.
I must say it's been a completely positive experience. The mainline Fedora kernel just worked without any need to mess with the DKMS. I just forwarded /dev/dri/* devices to my containers, and everything worked fine with ROCm.
I needed to grab a different image (-rocm instead of -cuda) for Ollama, change the type of whisper build for Storyteller. And that was it! On the host, nvtop works fine to visualize the GPU state, and VAAPI provides accelerated encoding for ffmpeg.
Honestly, it's been an absolutely pleasant experience compared to getting NVidia CUDA to work.
> Now they have a lot of money so not having money to pay for developers is not an excuse.
NVidia is the exception to the rule when it comes to hardware companies paying competitive salaries for software engineers. I imagine AMD is still permeated by the attitude that software "isn't real work" and doesn't deserve more compensation, and that kind of inertia is very hard to overcome.
It's insane to me that AMD is not spending billions and billions trying to fix their software. Nvidia is the most valuable company in the world and AMD is the only one poised to compete.
They are, but the problem is that shifting an organization whose lifeblood is yearly hardware refreshes and chip innovation towards a ship-daily software culture is challenging. And software doesn’t “make money” the way hardware does so it can get deprioritized by executives. And vendors are lining up to write and even open source lots of software for your platform in exchange for pricing, preference, priority (great on paper but bad for long term quality). And your competitors will get ahead of you if you miss even a single hardware trend/innovation.
I worked at at a number of GPU vendors, and it felt like Nvidia was the only one that took software as an asset worth investing in, rather than as a cost center. Massively different culture.
How those AMD crashes though. All my friends in AMD CPUs have had a hell of the last two years with constant crashes in unreal engine games. Meanwhile, I made fun of myself for buying an ancient 11 series which is a decade old arch at this point but is rock solid.
Their linux driver support isn't so great, though. I really considered an AMD GPU for my most recent build, and based on the driver support for just the integrated graphics on my new AMD CPU (7900X), I opted for an NVidia card instead.
I'm running a 6900XT on Arch and have no problems so far. Steam, Heroic launcher and every game i tried so far worked like a charm. You can even OC with LACT [1] if you want to.
I appreciate that there are people in academia working on this problem, but it seems like something AMD would have to fix internally if they were serious.
Except that this same team built a similarly named software package for Nvidia GPUs as well. It’s bright researchers doing what they do best if you ask me.
From this writeup it does sound like the architecture of the AMD gpu makes it a bit harder to optimize. It also seems like long term, the AMD approach may scale better in the long run. 8 chiplets rather than 2 for the nvidia offering, along with all the associated cache and memory locality woes.
The future will probably see more chiplets rather than less, so I wonder if dealing with complexity here will pay more dividends in the long run
AMD doesn't need warp specialisation for high performance while nvidia does, which simplifies programming AMD
see https://news.ycombinator.com/item?id=45923188 for HipKittens discussion
I think many people tried making AMD GPU go brrr for the mass of the developers but no one succeeded.
I don't get why AMD doesn't solve their own software issues. Now they have a lot of money so not having money to pay for developers is not an excuse.
And data centers GPUs are not the worst. Using GPU compute for things like running inference at home is a much, much better experience with Nvidia. My 5 years old RTX 3090 is better than any consumer GPU AMD released up to this date, at least for experimenting with ML and AI.
I just saw that Nvidia even maintains their own fork of Unreal Engine. AMD isn't even competing.
I recently switched from an NVidia card (5090) to a couple of AMD cards (R9700 32GB) for my inference server.
I must say it's been a completely positive experience. The mainline Fedora kernel just worked without any need to mess with the DKMS. I just forwarded /dev/dri/* devices to my containers, and everything worked fine with ROCm.
I needed to grab a different image (-rocm instead of -cuda) for Ollama, change the type of whisper build for Storyteller. And that was it! On the host, nvtop works fine to visualize the GPU state, and VAAPI provides accelerated encoding for ffmpeg.
Honestly, it's been an absolutely pleasant experience compared to getting NVidia CUDA to work.
And the developer experience is horrible when working with AMD. They don’t even accept driver crash bug reports.
> Now they have a lot of money so not having money to pay for developers is not an excuse.
NVidia is the exception to the rule when it comes to hardware companies paying competitive salaries for software engineers. I imagine AMD is still permeated by the attitude that software "isn't real work" and doesn't deserve more compensation, and that kind of inertia is very hard to overcome.
It's insane to me that AMD is not spending billions and billions trying to fix their software. Nvidia is the most valuable company in the world and AMD is the only one poised to compete.
They are, but the problem is that shifting an organization whose lifeblood is yearly hardware refreshes and chip innovation towards a ship-daily software culture is challenging. And software doesn’t “make money” the way hardware does so it can get deprioritized by executives. And vendors are lining up to write and even open source lots of software for your platform in exchange for pricing, preference, priority (great on paper but bad for long term quality). And your competitors will get ahead of you if you miss even a single hardware trend/innovation.
I worked at at a number of GPU vendors, and it felt like Nvidia was the only one that took software as an asset worth investing in, rather than as a cost center. Massively different culture.
It's not my favorite internet meme but I'm tickled to see "go brr" on a website/university like Stanford
Usually a sign that it is not cool anymore and the kids need to make something new up.
They already "went brr" when they announced ThunderKittens a year ago: https://hazyresearch.stanford.edu/blog/2024-05-12-tk
This meme is tired. Let it rest boss.
I quite like that the AMD aren’t so popular with the AI bubble. It means I can play games without getting a mortgage.
How those AMD crashes though. All my friends in AMD CPUs have had a hell of the last two years with constant crashes in unreal engine games. Meanwhile, I made fun of myself for buying an ancient 11 series which is a decade old arch at this point but is rock solid.
Just to point out that those crashes are specific to Windows: current generation of consoles run the same UE games with no crashes.
Their linux driver support isn't so great, though. I really considered an AMD GPU for my most recent build, and based on the driver support for just the integrated graphics on my new AMD CPU (7900X), I opted for an NVidia card instead.
I'm running a 6900XT on Arch and have no problems so far. Steam, Heroic launcher and every game i tried so far worked like a charm. You can even OC with LACT [1] if you want to.
[1] https://github.com/ilya-zlobintsev/LACT
I have a 9060 in one PC and a 9070 in another, on Fedora 43.
It runs great. Run all my steam stuff through them. Those days to mention have been long gone for quite awhile.