> By disassembly of ptxas, it is indeed hard-coded that they have logic like: strstr(kernel_name, "cutlass").
> it is likely that, this is an unstable, experimental, aggressive optimization by NVIDIA, and blindly always enabling it may produce some elusive bugs.
Often not elusive bugs, but elusive performance. GPU compilers are hard: Once you've done the basics, trying to do further transforms in a mature compiler will almost always produced mixed results. Some kernels will go faster, some will go slower, and you're hoping to move the balance and not hit any critical kernel too hard in your efforts to make another go faster.
An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup.
My experience is with non-Nvidia GPU systems, but this feels like a familiar situation. They probably found something that has great outcomes for one set of kernels, terrible outcomes for another, and no known reliable heuristic or modeling they could use to automatically choose.
Speaking from a place of long-term frustration with Java, some compiler authors just absolutely hate exposing the ability to hint/force optimizations. Never mind that it might improve performance for N-5 and N+5 major releases, it might be meaningless or unhelpful or difficult to maintain in a release ten years from now, so it must not be exposed today.
Thanks for a little context, this is not my wheelhouse at all (never even heard of this project) and I could not make heads or tails of the title or the linked PR.
Just in case anyone else parsed that sentence the same way as me, ati detected "quake" as the executable and changed things like texture quality etc to increase benchmark performance. Some people discovered this after they renamed the executable to "quack" and the image quality improved but the benchmarks were lower, proving that the ati drivers "optimised" by reducing quality.
Ati did not rename quake to quack as I originally thought from this! :)
The story was that they used a lower mipmap level (blurrier textures) when the process was named Quake, but used the normal mipmap level (standard textures) when the process was named Quack.
Or Win 3.1 looking for whatever shibboleth was in MS-DOS and popping up a scary-looking message if it found another DOS? https://en.wikipedia.org/wiki/AARD_code
Every vendor does this to this day - and its a morally grey practice, drivers hijack and modify the rendering loops of popular games, fixing bugs, replacing shaders with more optimized versions, enabling faster codepaths in the driver etc.
These changes are supposed to have minimal to no impact on the actual output, but sometimes vendors are really aggressive, and significantly degrade the outputs so that the game can run faster on their hardware.
Sadly it's built into the vulkan protocol. Even a fully userspace driver arrangement with a microkernel ends up giving the driver access to the client's information. Of course it's forgeable the way it's done though so you could opt out if you really wanted to.
I mean Khronos put that in for a reason. If the drivers didn't get explicit information about the application being run, they would do silly heuristics like quake3 to squeeze out performance.
Nvidia has a control panel with it's drivers. Open it up -> Manage 3D settings -> Program Settings. Scroll through and see how every single program/game you have installed openly has different defaults in it based on application name. As someone noted above others do the same thing.
Eg. Frostpunk has Antialiasing for transparency layers on. Slay the spire does not. I never set these settings. Nvidia literally does a lookup on first run for what they judge as best defaults and sets these appropriately.
Every single game/program you install has different options from a huge list of possible optimizations.
This is weirdly common; phone chipset manufacturers did it with phone benchmarks [0], VW with emissions [1], nVidia did it with 3DMark [2], Intel with the SPEC benchmark for its Xeon processors [3], etc.
When it comes to computer graphics, iirc it's pretty normalized now - graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.
(As an aside, I hate that I have to link to archive.org, there's a lot of dead links nowadays but these are important things to remember).
> When it comes to computer graphics, iirc it's pretty normalized now - graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.
> graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.
Maybe hyperbole, but I think obviously they can't do this for literally every game, that would require huge personnel resources. At least looking at mesa (linked elsewhere), only ~200 games are patched, out of what 100k PC games? So <1%.
if(AskLLM("Does function signature+name look like error handling code")) {
TurnOffInliner();
}
is actually probably a lot more effective than you'd think (generating PGO traces with a machine learning tool is apparently a thing that sort of works)
Reminds me of when ~10 years ago with a particular version of Webpack the build would fail if I had one SVG called add.svg so I had to rename it plus.svg
They're mostly not exactly prose but remember this was almost 40 years ago when the dominant style of writing code in some places was still ye olde K&R C with one letter variable names and goto everywhere
Would love to see Carmack's commit messages. Just the other day I unsuccessfully tried to look for pictures of his office newer than QuakeIII era. Want ti figure out his ergonomics for working (presumed) 10h days well into middle age.
what kind of AI are you using that generates shitty commit messages? This a common kind of message from Claude / Augment:
Fix dynamic channel list by passing auth via metadata
- Pass userId and userEmail in metadata when calling HTTP transport
- AuthenticatedToolsProviderFactory now reads from context.metadata
- Each tools/list request creates a fresh ToolsProvider with authentication
- Execute command description now correctly shows currently online machines
- Tested locally and working correctly
God I can't stand it when I get this kind of output from Claude, they really need to train it out for Claude 5.
"[Tangentially related emoji] I have completed this fully functional addition to the project that is now working perfectly! There are now zero bugs and the system is ready for deployment to production! [Rocketship emoji]"
Then of course you test it out and it doesn't work at all! It's very grating. It would be more bearable if it hedged its claims a bit more (maybe that will negatively affect the quality of the results though - if training a model to output insecure code also makes it a murderous Hitler admirer then, since when humans hedge their output is less likely to be perfect, it may mean it pushes the model to output code that is less than perfect).
> "[Tangentially related emoji] I have completed this fully functional addition to the project that is now working perfectly! There are now zero bugs and the system is ready for deployment to production! [Rocketship emoji]"
This made me laugh so hard. Never trust an AI model saying “There are now zero bugs”! Weaponized incompetence? :)
As a side note, I absolutely am in love with GPT-5 and GPT-5-codex. When I talk to it, it feels like talking to a peer and not an over enthusiastic (but talented) junior with potential. GPT-5-codex on high has been exceptional at debugging insidious bugs.
Set a PR template up, that demands those sections are filled in. Could probably do that down to the commit level with pre-commit but realistically you'd want that level of detail in the in the PR. Also add issue id to the commits too, that way you can pull them up easily and get more context.
I hate it when I look at some code, wondering why I added a refresh call at that point, I do a git blame to find the commit message, and it says "add refresh call".
That only works if the code is good enough to be the documentation. In DayJob prefer to cover all the bases:
∞ Try make the code sensible & readable so it can be the documentation.
∞ Comment well anyway, just in case it isn't as obvious to the reader (which might be me in a few months time) as it is to me when making the change. Excess comments can always be removed later (and, unless some idiot rewrites history, can potentially be referred to after removal if you have a “why t f” moment), comments you never write can't be found later.
∞ Either a directly meaningful commit message, or at very least ticket references to where more details can be found.
For personal tinkering, I'm a lot less fastidious.
When the "why" isn't explained, you end up with things like someone refactoring code and spending time (at best) trying to figure out why some tests now fail or (at worst) breaking something in production.
I'd argue that even the "how" sometimes is better explained in plain words than in code (even if that opens the door for outdated comments when code is changed).
True, you need to instruct the AI agents to include this.
In our case the agent has access to Jira and has wider knowledge. For commit messages i don’t bother that much anymore (i realise typing this), but for the MRs I do. Here i have to instruct it to remove implementation details.
> you need to instruct the AI agents to include this.
The agent can't do that if you told Claudepilotemini directly to make some change without telling it why you were prompting it to make such a change. LLMs might appear magic, but they aren't (yet) psychic.
He's saying that he likely has an MCP connected to jira on the LLM he's developing with.
Hence the prompt will have already referenced the jira ticket, which will include the why - and if not, you've got a different issue.
Now the LLM will only need something like "before committing, check the jira ticket we're working on and create a commit message ...
But whether you actually want that is a different story. You're off the opinion it's useful, I'd say it's rarely doing to be valuable, because requirements change, making this point in time rational mostly interesting in an academic sense, but not actually valuable for the development you're doing
It depends on a ton of factors, and at least I'd put very little stock in the validity of the commit message that it might as well not exist. (And this is from the perspective of human written ones, not AI)
Every time I’ve tried to use AI for commit messages its designers couldn’t be bothered to get it to take into account previous commit messages.
I use conventional commit formats for a reason, and the AI can’t even attempt it. I’m not even sure I’d trust it to get the right designation, like “fix(foo)!: increase container size”.
Yeah. I have never in my entire career thought "there are too many commit messages" when doing code archeology, but I have sometimes thought "damn, this commit is huge"
I like `tig blame` for this. It has a key (comma, maybe?) that pops back to the file just before the change on the highlighted line so you can quickly work your way backwards through non-changes like you describe. It doesn’t deal with renames well, though.
Do you mean the power mode? The article says there are 2 power modes, 5W and 10W, and that 10W is the default. From that I would assume that you can make everything go faster by using 10W mode, but only if you had already made things go slower by switching away from the default to the 5W mode? Did I miss something?
Sometimes you write some heavily tuned code in a high level language like C++ that you know could be translated into very specific GPU assembly, then find that the compiler isn't producing the exact assembly that you had in mind.
When you talk to the computer team about it they may offer a range of solutions, some of which may not be applicable to open source code. Picture proprietary #pragmas, intrinsics, or whatnot. What do you do? You can't ship a high performance library that doesn't deliver high performance. It is then when you rely on things like function names to enable specific code transformations that can't be used in general because they would sometimes break third party code.
I never worked on Cutlass, but this is the sort of thing that is done in the real world.
There is nothing nefarious about this sort of optimizatkon. People comparing this to cheating on benchmarks by rendering lower quality images are not on the right track.
There is no way to write inline SASS (assembly equivalent) for CUDA code. You can inline PTX, but PTX is a high level bytecode designed to be portable.
PTX is sometimes referred to as assembly, and it is an ISA, much lower level than C++. When people talk about writing inline assembly for CUDA, they mean PTX, and the C++ compiler’s inline assembly “asm” statement assumes PTX. For the most part you have much more control and ability to produce exactly the SASS you want when using PTX compared to when using C/C++.
It would be nice if we could find economics that allowed us to share code instead of all the bullshit with the binary blob drivers. Same for basebands and everything else. How many collective hours and months of our society’s finest minds has been wasted reverse engineering binary blobs, controllers through IO pins, trying to reverse engineer circuit schematics —- when all of this is already sitting on someone’s computer somewhere and they could just GIVE you the docs. CUDA and NVIDIA can go to hell.
The problem is as follows: You have a fixed cost investment to produce a software code base, then you have fixed ongoing maintenance costs, at a minimum one developer who knows the codebase. Preferably two for a commercial product. On top of that you have small distribution costs over time. E.g. servers that host the software downloads.
The marginal costs per user are very small or even zero for desktop applications. This means that software needs a funding structure with periodic payments, but at the same time the payments shouldn't grow with the number of users. There also needs to be a way for the initial investors who pay for the creation of new features or entire code bases to get their money back as the product becomes popular.
This in itself is not problematic, but it is not covered by traditional crowdfunding. The problem is that the funding goal needs to be met no matter what, and the contribution per user shrinks as more users contribute. You can't expect everyone to chip in 100%, 10% or even 1% of the funding cost, since that could be thousands of dollars even at the minimum. You need some sort of auctioning process where people can pledge a fixed quantity and if the user count is low enough, their pledge counts, otherwise it doesn't.
This has one problem though. What's problematic is the transition from the exclusive to non-exclusive mode.
There will be freeloaders who might pitch in five dollars, but they know five big corporations have chipped in and this covered the full development cost, leading to open sourcing the entire codebase. Everyone else is a freeloader. Including cheapskate corporations.
Someone really needs to learn to use `git commit --amend`. Almost 100 commits with pointless commit messages like "wip" or "x"? Be kinder to your reviewers...
Why do you care? Small commits are great for git bisect, and having to come up with a fancy message can break your flow. Code reviewers generally review a whole PR diff, not the individual commits. Fussing about commit messages smacks of prioritising aesthetics over functionality.
You have the right idea but, I believe, the wrong reasoning with your first two arguments.
git-bisect works best when every commit works, contains a single idea, and stacks in a linear history. These features are of most use in a publicly visible branch, and is why it is helpful to squash an entire pull-request into a single, atomic commit — one which clearly defines the change from before- to after-this-feature.
You’re welcome to do whatever you like in your private branch of course, but once you are presenting work for someone else to review then it’s consistent with “I believe this is now complete, correct, working, and ready for review” to squash everything into a single commit. (The fact that code review tools show the sum of all the minor commits is a workaround for people that don’t do this, not a feature to support them!)
In terms of ‘git commit -m wip’: no one is saying you should wear a suit and tie around the house, but when you show up for your senate hearing, presenting yourself formally is as necessary as it is to leave the slides, sweat pants, and tee shirt at home.
Yes, commit early and often while in the flow of putting together a new idea or piece of work. When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)
Or to use a different analogy: they don’t want every draft of your master’s thesis from start to finish and they’ll be annoyed if they have to fix basic typos for you that should’ve been caught before the final draft. They don’t care about the typos you already found either, nor how you fixed them. They just want the final draft and to discuss the ideas of the final draft!
Conversely if your master’s thesis or git branch contains multiple semantically meaningful changes — invent calculus then invent gravity / add foo-interface to lib_bar then add foo-login to homepage — then it probably ought to be two code reviews.
> git-bisect works best when every commit works, contains a single idea, and stacks in a linear history. That works best in a publicly visible branch, and is why it is helpful to squash an entire pull-request into a single, atomic commit — one which clearly defines the change from before- to after-this-feature.
Disagree; git-bisect works best when every commit is small and most commits work (in particular, as long as any broken commit is likely to have a neighbour that works - isolated bad commits aren't a problem (that's what skip is for, and it's easy enough to include that in your script - you do automate your bisects, right?), long chains of bad commits are). Squashing means your bisect will land on a squashed commit, when it's only really done half the job. (In particular, the very worst case, where every single one of your intermediate commits was broken, is the same as the case you get when you squash)
> When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)
And that's what the PR description is for! You don't have to destroy all your history to make one.
Thanks for responding. Everything you say I agree with. I think our differences lie in the scope of how much of my private activity do I want to share in public.
You’re right that GitHub, GitLab et al let you use their tooling to write the final commit message (for the merge commit or squash commit). My preference has always been to do that in git itself.
In both cases you end up with a single atomic commit that represents the approved change and its description. For me, the commit is created the moment a review is requested, instead of from the moment it is approved and landed. One reason this is particularly useful is that you can now treat the commit as if it had already landed on the main branch. (It is easier to share, cherry-pick, rebase, etc. — easier than doing so with a branch of many commits, in my experience.)
Prospective changes do not change type (from branch to squashed commit or merge commit) either, when they are approved, which simplifies these workflows.
> One reason this is particularly useful is that you can now treat the commit as if it had already landed on the main branch. (It is easier to share, cherry-pick, rebase, etc. — easier than doing so with a branch of many commits, in my experience.)
> Prospective changes do not change type (from branch to squashed commit or merge commit) either, when they are approved, which simplifies these workflows.
You can do that even earlier if you simply never squash or otherwise edit history, which is my approach - any pushed feature branch is public as far as I'm concerned, and my colleagues are encouraged to pull them if they're e.g. working on the same area at the same time. It comes at the cost of having to actually revert when you need to undo a pushed commit (and, since cherry-pick is not an option, if you're making e.g. a standalone fix and want your colleagues to be able to pull it in unrelated branches, you have to think ahead a little and make it on a new branch based from master rather than sticking it in the middle of your incomplete feature branch), but it's very much worth it IME.
You can also use just plain git and make sure the merge-commit has a useful message, while leaving the work-branch unsquashed and available to explore and bisect when necessary. The main branch looks as neat as when using squashes, and using something like git log --merges --first-parent all the small commits on the work-branches are hidden anyway. It looks just like when using atomic commits, but the extra details are still there when someone needs them.
Not sure it really has huge benefits, but I guess something like this should work:
```
#!/bin/sh
N=20
msg_len=$(git log -1 --pretty=%B | wc -c)
if [ "$msg_len" -lt "$N" ]; then
exit 125
fi
# Here you would run your actual test and report 0 (good), 1 (bad) as needed
exit 0
```
> When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)
This is an extremely opinionated and time consuming way of working. Maybe in this context it makes sense (nvidia driver kernel somethings), but I don't think it's universally the best way to write code together.
I agree that it’s time consuming but the complexity is constant, in my personal experience and with helping others, in that once you start writing long form commit messages (a) you only ever get faster at it, as a skill; and (b) it’s hard to stop!
One of the best things about git, and the reason it won, is that as a tool, it's extremely unopinionated on this matter, and is supportive of however you want to do it. Of course,
one of the worst things about git is how unopinionated it is. If you want 300 commit messages in every branch with the commit message of "poop" and no squashing, and none of them even compile, the tool isn't going to stop you. If every commit is fully functional and rebased on top of master so the graph isn't an octopus, you can. If you'd rather use the name main as the primary branch, also totally fine. Git, the tool leaves all that up to the user and the culture they operate in.
Naturally, I have Opinions on the right way to use git, having used it since inception within various different contexts at various places, along with other VCSs. What works at one place won't be right for another place, and vice versa. Especially given different skill levels of individuals and of teams, the tools involved, and how much weight I have to review code and commits before it gets accepted. What's important is it should work for you, not the other way around. Regardless of where I'm working though, my local commit messages are total crap. "wip" being the most common, but I commit frequently. Importantly though, before I do eg a slightly involved refactor, going back to see what it was before I started is trivial. Being a skilled operator of git is important to make it easy to run newly written tests against the old code. Being efficient at rebase -i and sorting commits into understandable chunks and squashing minor commits to keep things clean is key.
I don't think every patch in a series has to work totally independently for every git repo, but what it comes down to is maintenance. There's nothing worse than digging around in git history, trying to figure out why things are how they are, only to dead end at a 3000 line commit from 5 years ago with the message "poop, lol". It's even worse when the person who did that was you!
Universally, what it comes down to is maintenance. That totally rushed prototype that was just for a demo has now been in production for years, and there's this weird bug with the new database. If you hate yourself, your job, future you, your colleagues, and everybody that comes after you, and you're no good at git, by all means, shit out 300 commits,
don't squash, and have the PR message be totally useless. Also believe you're hot shit after one semester of boot camp and that no one else cares just because you want to go home, get high, and play xbox. (Not remotely saying that's you, but those people are out there.)
We could get all philosophical and try and answer the question of if there are any universal truths, nevermind universally best git commit practices.
I don't work where you work, don't know your team, or anybody's skill levels on it,
so I'll just close with a couple thoughts. the tool is there to work for you, so learn to work with it not against it. Git bisect is your friend. And that it really sucks 4 years later to be confronted by totally useless commit messages on inappropriately sized commits (too big or too small) and have to guess at things in order to ship fixes to prod (on the prototype that was supposed to get thrown away but never did) and just hope and pray that you've guessed correctly.
Small commits where each commit represents nothing of value and doesn't compile are terrible for git bisect. And for code review.
A code reviewer doesn't care that you spent 10 days across 100 commits tweaking some small piece of code, the code reviewer cares about what you ended up with.
The bisecter probably wants to compile the code at every commit to try and see if they can reproduce a bug. Commits which don't compile force the bisecter to step through commits one by one until they find one which compiles, linearizing a process which should be logarithmic.
Message doesn’t need to be fancy, but it should describe what you did. Being unable to articulate your actions is a thought smell. It’s often seen when the developer is trying stuff until it sticks and needs a commit because the only way to test the fix is in a testing environment, two bad practices.
Do you frequently dig in to new codebases? I do, and commits that contain a functional, complete idea with a descriptive commit message are immensely useful to me for understanding why the code is the way it is.
The only sane thing a maintainer can do with something like this is squash it into one commit. So if you care about `git bisect` then you don't want this.
Why do I care? Interesting question. I'm generally a person who cares, I guess. In this specific case it seems analogous to a mechanic having a tidy workshop. Would you leave your bicycle with someone head to toe in grease and tools strewn all over the place? I wouldn't.
When I was a kid and my family first moved to the US my dad used to take me for walks. He didn't know anything about the country (couldn't speak English yet) so the only thing he could comment on was what he imagined the prices of all the houses were. Some people just feel the need to comment on something even when they don't understand anything.
The proper way to work with git: Commit like a madman on your private branch. Short messages, written in seconds, just to be able to remember what you were doing if you are interrupted and have to get back into your work later. If you have a CI pipeline, often you have to make small changes until it works, so no reason to bother with smart commit messages.
At some point, you will have something working that makes sense that clean up. Then use interactive rebase to create one or a few commits that "makes sense". What makes sense is one of these topics that could create a whole bike garage, but you and your team will have some agreement on it. One thing that I like is to keep pure refactorings by themselves. No one cares to review that you've changed typos in old variables names and things like that. If it's a separate commit, you can just skip over it.
Depending on if you are completely done or not, the resulting branch can be sent as a PR/MR. Make sure that all commits have a reason why the change was made. No reason to repeat what the code says or generate some AI slop message. Your knowledge of why a change was done in a certain way is the most valuable part.
Of course, this way of working fits my work, that is not cloud based in any way and with lots of legacy code. It creates git history that I would like to have if I have to take over old projects or if I have to run git bisect on an unfamiliar code base and figure out some obscure bug. You might have a completely different technology stack and business, where it makes sense to work in some other way with git.
There is no "proper" way to use git, there is only a proper way to interact with your fellow developers.
You can do whatever you like locally. That's the luxury of git: it's distributed so you always get your own playground. You can make commits with short names if you find that useful. Personally I prefer to track my progress using a todo system, so I don't need commits to tell me where I am. I use `stash` instead of committing broken stuff if I need to switch branches.
I've found the idea of "rebase later" always easier said than done. In my experience if you work like that you'll more often than not end up just squashing everything into one commit. I prefer to rebase as I go. I'll rebase multiple times a day. Rearranging commits, amending the most recent commit etc. Keeping on top of it is the best way to achieve success at the end. It's like spending that bit of extra time putting your tools away or sweeping the floor. It pays off in the long run.
? I assume you want to replace your jubberish messages with something more useful before pushing? It is only "destroying"
https://xkcd.com/1296/ style crap? Code changes stay the same.
Literally no one looks through the individual commits in a PR that's gonna be squashed. I don't care if it's 10 or 10,000 - I'm always gonna review the full thing.
Isn't your interpretation backwards in some cases? What I mean, is that _because_ you see the intermediate commits are garbage, you _then_ decide not to review the individual commits (because you are interested in the contribution anyway).
I certainly do care for the hobby FOSS projects I maintain, and bad commit messages + mega-commits won't fly at my day job.
Squash-merging has the advantages of making 1 PR == one commit with the PR ID in the commit message, sure, but it's unfortunately promotes bad Git hygiene (and works around it)
You might be surprised. Yours sounds like the attitude of someone who has not had the luxury of reviewing well-constructed commits. PRs with intentional commits permit both faster and deeper reviews—but alas, not everyone is so respectful of their reviewers’ time and energy.
If intentionally slowing non CUTLASS shaders, sure pitchfork time.
If it's an option that /technically/ breaks the CUDA shader compatibility contract, then enabling it in specific "known good" situations is just business as usual for GPU drivers.
That can be for all kinds of reasons - straightforward bugs or incomplete paths in the optimization implementation, the app not actually needing the stricter parts of the contract so can have a faster path, or even bugs in apps that need workarounds.
Though piggybacking into these without understanding can be extremely fragile - you don't know why they've limited it, and you run the risk of tripping over some situation that will simply fail, either with incorrect results or something like a crash. And possibly in rather unexpected, unpredictable situations.
Intel disabled optimisations when they detected they were running on their competitors hardware. The motivation was to make competitors compare badly in benchmarks.
Nvidia are disabling optimisations on their own hardware. The motivation appears to be related to these optimisations being unsafe to apply to general code.
And again in 2010, although as far as I'm aware this was just based on speculation and it was never proved that it was intentional, or that the optimisation would have netted the gains the author said: https://web.archive.org/web/20250325144612/https://www.realw...
https://github.com/triton-lang/triton/pull/7298#discussion_r...
> By disassembly of ptxas, it is indeed hard-coded that they have logic like: strstr(kernel_name, "cutlass").
> it is likely that, this is an unstable, experimental, aggressive optimization by NVIDIA, and blindly always enabling it may produce some elusive bugs.
Often not elusive bugs, but elusive performance. GPU compilers are hard: Once you've done the basics, trying to do further transforms in a mature compiler will almost always produced mixed results. Some kernels will go faster, some will go slower, and you're hoping to move the balance and not hit any critical kernel too hard in your efforts to make another go faster.
An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup.
My experience is with non-Nvidia GPU systems, but this feels like a familiar situation. They probably found something that has great outcomes for one set of kernels, terrible outcomes for another, and no known reliable heuristic or modeling they could use to automatically choose.
A saner design would turn this optimization into a documented flag that anyone can opt into.
Speaking from a place of long-term frustration with Java, some compiler authors just absolutely hate exposing the ability to hint/force optimizations. Never mind that it might improve performance for N-5 and N+5 major releases, it might be meaningless or unhelpful or difficult to maintain in a release ten years from now, so it must not be exposed today.
That seems valid for customers expecting a warranty or support. But they should allow it if customers waive all such in writing.
Warranty and support specifically for that flag? Because I don't see how general warranty and support requires keeping any hint flags forever.
Thanks for a little context, this is not my wheelhouse at all (never even heard of this project) and I could not make heads or tails of the title or the linked PR.
Heh. Does anyone remember when almost 25 years ago ATI (AMD) caught manipulating the Quake III benchmarks by renaming the executables to ‘quack’?
https://web.archive.org/web/20230929180112/https://techrepor...
https://web.archive.org/web/20011108190056/https://hardocp.c...
https://web.archive.org/web/20011118183932/www.3dcenter.de/a...
Just in case anyone else parsed that sentence the same way as me, ati detected "quake" as the executable and changed things like texture quality etc to increase benchmark performance. Some people discovered this after they renamed the executable to "quack" and the image quality improved but the benchmarks were lower, proving that the ati drivers "optimised" by reducing quality.
Ati did not rename quake to quack as I originally thought from this! :)
The story was that they used a lower mipmap level (blurrier textures) when the process was named Quake, but used the normal mipmap level (standard textures) when the process was named Quack.
Thank you for explaining. I was so confused at how AMD was improving Quake performance with duck-like monikers.
Well, if it _looks_ like a high-performance texture renderer, and it _walks_ like a high-performance texture renderer...
It's probably been duck typed
shocked quack
If it looks like a benchmark and it quacks like a benchmark… duck?
So the additional performance came with a large bill?
Or Intel checking for "GenuineIntel" in ICC's output: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Support...
Or Win 3.1 looking for whatever shibboleth was in MS-DOS and popping up a scary-looking message if it found another DOS? https://en.wikipedia.org/wiki/AARD_code
I don’t think anybody remembers this since that code never shipped in retail.
It didn't ship (in the final retail version) only after the tech press of the day exposed what Microsoft had done.
It did ship in the final retail version, in way. It was disabled, but the code was still there, and a flag was all that was needed to enable it.
Every vendor does this to this day - and its a morally grey practice, drivers hijack and modify the rendering loops of popular games, fixing bugs, replacing shaders with more optimized versions, enabling faster codepaths in the driver etc.
These changes are supposed to have minimal to no impact on the actual output, but sometimes vendors are really aggressive, and significantly degrade the outputs so that the game can run faster on their hardware.
Sadly it's built into the vulkan protocol. Even a fully userspace driver arrangement with a microkernel ends up giving the driver access to the client's information. Of course it's forgeable the way it's done though so you could opt out if you really wanted to.
[1]: https://github.com/KhronosGroup/Vulkan-Headers/blob/main/inc...
I mean Khronos put that in for a reason. If the drivers didn't get explicit information about the application being run, they would do silly heuristics like quake3 to squeeze out performance.
> but sometimes vendors are really aggressive, and significantly degrade the outputs so that the game can run faster on their hardware.
Do you have a source for this? I’d like to see some examples
Nvidia has a control panel with it's drivers. Open it up -> Manage 3D settings -> Program Settings. Scroll through and see how every single program/game you have installed openly has different defaults in it based on application name. As someone noted above others do the same thing.
Eg. Frostpunk has Antialiasing for transparency layers on. Slay the spire does not. I never set these settings. Nvidia literally does a lookup on first run for what they judge as best defaults and sets these appropriately.
Every single game/program you install has different options from a huge list of possible optimizations.
Applying different standard settings is pretty different from "hijacking and modifying the rendering loop", though.
For more context and deeper discussion on the subject, see https://news.ycombinator.com/item?id=44531107
Funnily, it's under an older submission of the same cutlass optimizations.
This is weirdly common; phone chipset manufacturers did it with phone benchmarks [0], VW with emissions [1], nVidia did it with 3DMark [2], Intel with the SPEC benchmark for its Xeon processors [3], etc.
When it comes to computer graphics, iirc it's pretty normalized now - graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.
(As an aside, I hate that I have to link to archive.org, there's a lot of dead links nowadays but these are important things to remember).
[0] https://web.archive.org/web/20250306120819/https://www.anand...
[1] https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal
[2] https://web.archive.org/web/20051218120547/http://techreport...
[3] https://www.servethehome.com/impact-of-intel-compiler-optimi...
> When it comes to computer graphics, iirc it's pretty normalized now - graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.
Even Mesa has them: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/uti...
> graphics drivers all seem to have tweaks, settings, optimizations and workarounds for every game.
Maybe hyperbole, but I think obviously they can't do this for literally every game, that would require huge personnel resources. At least looking at mesa (linked elsewhere), only ~200 games are patched, out of what 100k PC games? So <1%.
Well, pretty much every large AAA game launch complimented by GPU driver upgrade that adds support for that game. It's in the patch notes.
Goodhart's law: when a measure becomes a target, it ceases to be a good measure.
Are there any archives of that techreport article with images intact?
Ah yes they changed the site and URL system after some years, here is the OG one with screenshots
Page 1 https://web.archive.org/web/20071028172853/http://techreport...
Page 2 https://web.archive.org/web/20111130162817/http://techreport...
Page 3 https://web.archive.org/web/20080213212637/http://techreport...
Page 4 https://web.archive.org/web/20101110031431/http://techreport...
Page 5 https://web.archive.org/web/20101108144857/http://techreport...
I work with compilers
And despite it not being nice, some optimizations rely on type or function names schemas/substrings/etc
It sucks, but thats how it works
It doesnt have to be malicious just sometimes it is safer to deploy optimization only for your libs than risk breaking stuff
Or your frontend is not giving you more data which you can rely on
It is probably not malicious, but it certainly does create new barriers, which is not a good thing.
On function types or schema, I can understand that. But names ?
Something like:
is actually probably a lot more effective than you'd think (generating PGO traces with a machine learning tool is apparently a thing that sort of works)> than risk breaking stuff
... until somebody randomly chooses the same name for some reason and gets hosed.
You're not helping.
nil novi sub soli.
- Intel faced a "cheating compiler" controversy when SPEC, the benchmark standard-setter, invalidated over 2,600 benchmark results for Intel Xeon processors in early 2024. ( https://www.tomshardware.com/pc-components/cpus/spec-invalid... )
- microsoft doing similar things (java benchmarks, C compiler benchmarks)
- and everybody cheating on AI benchmarks (https://www.thestack.technology/ai-benchmarking-scandal-were...)
Reminds me of when ~10 years ago with a particular version of Webpack the build would fail if I had one SVG called add.svg so I had to rename it plus.svg
or breaking numpy by importing your api key from a little file you wrote called "secret.py" :P
Keeping it real with the commit msgs
Some criticism of the author here regarding how they structure their diffs.
They "made something ~100 tflops faster" and peoples' comments are "their commit messages are bad"? You guys would hate how John Carmack worked, too
https://github.com/oliverbenns/john-carmack-plan you can read carmacks old .plan files
They're mostly not exactly prose but remember this was almost 40 years ago when the dominant style of writing code in some places was still ye olde K&R C with one letter variable names and goto everywhere
I was appreciative/shitposting.
Would love to see Carmack's commit messages. Just the other day I unsuccessfully tried to look for pictures of his office newer than QuakeIII era. Want ti figure out his ergonomics for working (presumed) 10h days well into middle age.
Looks pretty normal: https://playcanv.as/p/apIKHp7a
I much prefer this over those AI generated commit messages that just say "refactored X" every single commit.
what kind of AI are you using that generates shitty commit messages? This a common kind of message from Claude / Augment:
> - Tested locally and working correctly
This is completely meaningless and just pollutes the log.
"ready for production", "fully working" and other Claude-isms come to mind
You're totally right!
God I can't stand it when I get this kind of output from Claude, they really need to train it out for Claude 5.
"[Tangentially related emoji] I have completed this fully functional addition to the project that is now working perfectly! There are now zero bugs and the system is ready for deployment to production! [Rocketship emoji]"
Then of course you test it out and it doesn't work at all! It's very grating. It would be more bearable if it hedged its claims a bit more (maybe that will negatively affect the quality of the results though - if training a model to output insecure code also makes it a murderous Hitler admirer then, since when humans hedge their output is less likely to be perfect, it may mean it pushes the model to output code that is less than perfect).
> "[Tangentially related emoji] I have completed this fully functional addition to the project that is now working perfectly! There are now zero bugs and the system is ready for deployment to production! [Rocketship emoji]"
This made me laugh so hard. Never trust an AI model saying “There are now zero bugs”! Weaponized incompetence? :)
As a side note, I absolutely am in love with GPT-5 and GPT-5-codex. When I talk to it, it feels like talking to a peer and not an over enthusiastic (but talented) junior with potential. GPT-5-codex on high has been exceptional at debugging insidious bugs.
And there's at least an 80% chance one of those items is, in fact, not in the commit.
It is missing the (to me) most important part. The reason why these changes are made.
Set a PR template up, that demands those sections are filled in. Could probably do that down to the commit level with pre-commit but realistically you'd want that level of detail in the in the PR. Also add issue id to the commits too, that way you can pull them up easily and get more context.
I hate it when I look at some code, wondering why I added a refresh call at that point, I do a git blame to find the commit message, and it says "add refresh call".
But... I keep being told that commit messages are useless because the code is the documentation, so code diffs are self-explanatory...
That only works if the code is good enough to be the documentation. In DayJob prefer to cover all the bases:
∞ Try make the code sensible & readable so it can be the documentation.
∞ Comment well anyway, just in case it isn't as obvious to the reader (which might be me in a few months time) as it is to me when making the change. Excess comments can always be removed later (and, unless some idiot rewrites history, can potentially be referred to after removal if you have a “why t f” moment), comments you never write can't be found later.
∞ Either a directly meaningful commit message, or at very least ticket references to where more details can be found.
For personal tinkering, I'm a lot less fastidious.
> That only works if the code is good enough to be the documentation.
It never actually is at any non-minimal scale (and not even the code authored by the the people who claim code is self documenting).
My comment was rhetorical and sarcastic.
The code is the "how", sometimes it's necessary to explain the "why".
Sarcasm aside, yes, I 100% agree with you.
When the "why" isn't explained, you end up with things like someone refactoring code and spending time (at best) trying to figure out why some tests now fail or (at worst) breaking something in production.
I'd argue that even the "how" sometimes is better explained in plain words than in code (even if that opens the door for outdated comments when code is changed).
Sometimes I wonder if I really do just need to add /s every time I'm being sarcastic.
Sometimes I think that people who can't write well enough to convey sarcasm when they mean it should just avoid using it and say what they mean.
Now that looks like sarcasm.
Unfortunately HN does not support use of the Sarcasm font.
/s
True, you need to instruct the AI agents to include this.
In our case the agent has access to Jira and has wider knowledge. For commit messages i don’t bother that much anymore (i realise typing this), but for the MRs I do. Here i have to instruct it to remove implementation details.
> you need to instruct the AI agents to include this.
The agent can't do that if you told Claudepilotemini directly to make some change without telling it why you were prompting it to make such a change. LLMs might appear magic, but they aren't (yet) psychic.
I think you're missing context.
He's saying that he likely has an MCP connected to jira on the LLM he's developing with.
Hence the prompt will have already referenced the jira ticket, which will include the why - and if not, you've got a different issue. Now the LLM will only need something like "before committing, check the jira ticket we're working on and create a commit message ...
But whether you actually want that is a different story. You're off the opinion it's useful, I'd say it's rarely doing to be valuable, because requirements change, making this point in time rational mostly interesting in an academic sense, but not actually valuable for the development you're doing
It depends on a ton of factors, and at least I'd put very little stock in the validity of the commit message that it might as well not exist. (And this is from the perspective of human written ones, not AI)
Isn't "Fix dynamic channel list" the reason?
> - Tested locally and working correctly
If a human puts that, I doubt it. If I know they are using “AI” to fill in commit message I'll just assume it is a complete hallucination.
GitHub Copilot for one, and I'm pretty sure JetBrains' offering does the same.
JetBrains does a good job on them as well. Copilot is shit.
Every time I’ve tried to use AI for commit messages its designers couldn’t be bothered to get it to take into account previous commit messages.
I use conventional commit formats for a reason, and the AI can’t even attempt it. I’m not even sure I’d trust it to get the right designation, like “fix(foo)!: increase container size”.
So AI really does learn from humans...
I think it's fine if you squash it. I have no idea why they didn't squash it before pushing to GitHub though.
They probably didn’t care. And having many small commits instead of a big squashed one can be useful when using git bisect for example.
Yeah. I have never in my entire career thought "there are too many commit messages" when doing code archeology, but I have sometimes thought "damn, this commit is huge"
I have all the time. When git blame says "fix a typo" or when you look at a commit graph and see spaghetti.
I like `tig blame` for this. It has a key (comma, maybe?) that pops back to the file just before the change on the highlighted line so you can quickly work your way backwards through non-changes like you describe. It doesn’t deal with renames well, though.
Not really because CI only needs to pass for the final commit so it's super unlikely that the intermediate ones work.
They squashed it before pushing to main.
Reminds me of when I was working on NVIDIA Jetson systems, and learning how to use them, that you can run 1 command to make everything go faster... (https://jetsonhacks.com/2019/04/10/jetson-nano-use-more-powe...)
Do you mean the power mode? The article says there are 2 power modes, 5W and 10W, and that 10W is the default. From that I would assume that you can make everything go faster by using 10W mode, but only if you had already made things go slower by switching away from the default to the 5W mode? Did I miss something?
Sometimes you write some heavily tuned code in a high level language like C++ that you know could be translated into very specific GPU assembly, then find that the compiler isn't producing the exact assembly that you had in mind.
When you talk to the computer team about it they may offer a range of solutions, some of which may not be applicable to open source code. Picture proprietary #pragmas, intrinsics, or whatnot. What do you do? You can't ship a high performance library that doesn't deliver high performance. It is then when you rely on things like function names to enable specific code transformations that can't be used in general because they would sometimes break third party code.
I never worked on Cutlass, but this is the sort of thing that is done in the real world.
There is nothing nefarious about this sort of optimizatkon. People comparing this to cheating on benchmarks by rendering lower quality images are not on the right track.
Why wouldn’t you just use inline assembly in that case?
There is no way to write inline SASS (assembly equivalent) for CUDA code. You can inline PTX, but PTX is a high level bytecode designed to be portable.
PTX is sometimes referred to as assembly, and it is an ISA, much lower level than C++. When people talk about writing inline assembly for CUDA, they mean PTX, and the C++ compiler’s inline assembly “asm” statement assumes PTX. For the most part you have much more control and ability to produce exactly the SASS you want when using PTX compared to when using C/C++.
https://developer.nvidia.com/blog/understanding-ptx-the-asse...
https://eunomia.dev/others/cuda-tutorial/02-ptx-assembly/
This was discussed before at the time the PR was created and there's nothing new that I can see.
https://news.ycombinator.com/item?id=44530581
Thought this looked familiar...
It would be nice if we could find economics that allowed us to share code instead of all the bullshit with the binary blob drivers. Same for basebands and everything else. How many collective hours and months of our society’s finest minds has been wasted reverse engineering binary blobs, controllers through IO pins, trying to reverse engineer circuit schematics —- when all of this is already sitting on someone’s computer somewhere and they could just GIVE you the docs. CUDA and NVIDIA can go to hell.
The problem is as follows: You have a fixed cost investment to produce a software code base, then you have fixed ongoing maintenance costs, at a minimum one developer who knows the codebase. Preferably two for a commercial product. On top of that you have small distribution costs over time. E.g. servers that host the software downloads.
The marginal costs per user are very small or even zero for desktop applications. This means that software needs a funding structure with periodic payments, but at the same time the payments shouldn't grow with the number of users. There also needs to be a way for the initial investors who pay for the creation of new features or entire code bases to get their money back as the product becomes popular.
This in itself is not problematic, but it is not covered by traditional crowdfunding. The problem is that the funding goal needs to be met no matter what, and the contribution per user shrinks as more users contribute. You can't expect everyone to chip in 100%, 10% or even 1% of the funding cost, since that could be thousands of dollars even at the minimum. You need some sort of auctioning process where people can pledge a fixed quantity and if the user count is low enough, their pledge counts, otherwise it doesn't.
This has one problem though. What's problematic is the transition from the exclusive to non-exclusive mode.
There will be freeloaders who might pitch in five dollars, but they know five big corporations have chipped in and this covered the full development cost, leading to open sourcing the entire codebase. Everyone else is a freeloader. Including cheapskate corporations.
Someone really needs to learn to use `git commit --amend`. Almost 100 commits with pointless commit messages like "wip" or "x"? Be kinder to your reviewers...
Why do you care? Small commits are great for git bisect, and having to come up with a fancy message can break your flow. Code reviewers generally review a whole PR diff, not the individual commits. Fussing about commit messages smacks of prioritising aesthetics over functionality.
You have the right idea but, I believe, the wrong reasoning with your first two arguments.
git-bisect works best when every commit works, contains a single idea, and stacks in a linear history. These features are of most use in a publicly visible branch, and is why it is helpful to squash an entire pull-request into a single, atomic commit — one which clearly defines the change from before- to after-this-feature.
You’re welcome to do whatever you like in your private branch of course, but once you are presenting work for someone else to review then it’s consistent with “I believe this is now complete, correct, working, and ready for review” to squash everything into a single commit. (The fact that code review tools show the sum of all the minor commits is a workaround for people that don’t do this, not a feature to support them!)
In terms of ‘git commit -m wip’: no one is saying you should wear a suit and tie around the house, but when you show up for your senate hearing, presenting yourself formally is as necessary as it is to leave the slides, sweat pants, and tee shirt at home.
Yes, commit early and often while in the flow of putting together a new idea or piece of work. When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)
Or to use a different analogy: they don’t want every draft of your master’s thesis from start to finish and they’ll be annoyed if they have to fix basic typos for you that should’ve been caught before the final draft. They don’t care about the typos you already found either, nor how you fixed them. They just want the final draft and to discuss the ideas of the final draft!
Conversely if your master’s thesis or git branch contains multiple semantically meaningful changes — invent calculus then invent gravity / add foo-interface to lib_bar then add foo-login to homepage — then it probably ought to be two code reviews.
> git-bisect works best when every commit works, contains a single idea, and stacks in a linear history. That works best in a publicly visible branch, and is why it is helpful to squash an entire pull-request into a single, atomic commit — one which clearly defines the change from before- to after-this-feature.
Disagree; git-bisect works best when every commit is small and most commits work (in particular, as long as any broken commit is likely to have a neighbour that works - isolated bad commits aren't a problem (that's what skip is for, and it's easy enough to include that in your script - you do automate your bisects, right?), long chains of bad commits are). Squashing means your bisect will land on a squashed commit, when it's only really done half the job. (In particular, the very worst case, where every single one of your intermediate commits was broken, is the same as the case you get when you squash)
> When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)
And that's what the PR description is for! You don't have to destroy all your history to make one.
Thanks for responding. Everything you say I agree with. I think our differences lie in the scope of how much of my private activity do I want to share in public.
You’re right that GitHub, GitLab et al let you use their tooling to write the final commit message (for the merge commit or squash commit). My preference has always been to do that in git itself.
In both cases you end up with a single atomic commit that represents the approved change and its description. For me, the commit is created the moment a review is requested, instead of from the moment it is approved and landed. One reason this is particularly useful is that you can now treat the commit as if it had already landed on the main branch. (It is easier to share, cherry-pick, rebase, etc. — easier than doing so with a branch of many commits, in my experience.)
Prospective changes do not change type (from branch to squashed commit or merge commit) either, when they are approved, which simplifies these workflows.
> One reason this is particularly useful is that you can now treat the commit as if it had already landed on the main branch. (It is easier to share, cherry-pick, rebase, etc. — easier than doing so with a branch of many commits, in my experience.)
> Prospective changes do not change type (from branch to squashed commit or merge commit) either, when they are approved, which simplifies these workflows.
You can do that even earlier if you simply never squash or otherwise edit history, which is my approach - any pushed feature branch is public as far as I'm concerned, and my colleagues are encouraged to pull them if they're e.g. working on the same area at the same time. It comes at the cost of having to actually revert when you need to undo a pushed commit (and, since cherry-pick is not an option, if you're making e.g. a standalone fix and want your colleagues to be able to pull it in unrelated branches, you have to think ahead a little and make it on a new branch based from master rather than sticking it in the middle of your incomplete feature branch), but it's very much worth it IME.
You can also use just plain git and make sure the merge-commit has a useful message, while leaving the work-branch unsquashed and available to explore and bisect when necessary. The main branch looks as neat as when using squashes, and using something like git log --merges --first-parent all the small commits on the work-branches are hidden anyway. It looks just like when using atomic commits, but the extra details are still there when someone needs them.
I've followed your same approach but switched to having just "most of the commits work" after I found out about git bisect --skip.
Can't git-bisect simply ignore commits with a commit message smaller than N characters?
Yes - or, being more principled, you can tell it to only use mainline commits (first parent) if landing on a whole PR is what you want.
That might sound facile, but it's actually a great idea. Being able to ignore commits based on regexes would be even more powerful.
Not sure it really has huge benefits, but I guess something like this should work:
``` #!/bin/sh N=20 msg_len=$(git log -1 --pretty=%B | wc -c) if [ "$msg_len" -lt "$N" ]; then exit 125 fi # Here you would run your actual test and report 0 (good), 1 (bad) as needed exit 0 ```
> When it’s ready for the attention of your peers then you absolutely ought to dress it up as smartly as possible. It’s at that point that you write a cover letter for your change: what was the situation before, why that was bad, what this patch does instead, and how you proved in practice that it made things better (tests!)
This is an extremely opinionated and time consuming way of working. Maybe in this context it makes sense (nvidia driver kernel somethings), but I don't think it's universally the best way to write code together.
I agree that it’s time consuming but the complexity is constant, in my personal experience and with helping others, in that once you start writing long form commit messages (a) you only ever get faster at it, as a skill; and (b) it’s hard to stop!
One of the best things about git, and the reason it won, is that as a tool, it's extremely unopinionated on this matter, and is supportive of however you want to do it. Of course, one of the worst things about git is how unopinionated it is. If you want 300 commit messages in every branch with the commit message of "poop" and no squashing, and none of them even compile, the tool isn't going to stop you. If every commit is fully functional and rebased on top of master so the graph isn't an octopus, you can. If you'd rather use the name main as the primary branch, also totally fine. Git, the tool leaves all that up to the user and the culture they operate in.
Naturally, I have Opinions on the right way to use git, having used it since inception within various different contexts at various places, along with other VCSs. What works at one place won't be right for another place, and vice versa. Especially given different skill levels of individuals and of teams, the tools involved, and how much weight I have to review code and commits before it gets accepted. What's important is it should work for you, not the other way around. Regardless of where I'm working though, my local commit messages are total crap. "wip" being the most common, but I commit frequently. Importantly though, before I do eg a slightly involved refactor, going back to see what it was before I started is trivial. Being a skilled operator of git is important to make it easy to run newly written tests against the old code. Being efficient at rebase -i and sorting commits into understandable chunks and squashing minor commits to keep things clean is key.
I don't think every patch in a series has to work totally independently for every git repo, but what it comes down to is maintenance. There's nothing worse than digging around in git history, trying to figure out why things are how they are, only to dead end at a 3000 line commit from 5 years ago with the message "poop, lol". It's even worse when the person who did that was you!
Universally, what it comes down to is maintenance. That totally rushed prototype that was just for a demo has now been in production for years, and there's this weird bug with the new database. If you hate yourself, your job, future you, your colleagues, and everybody that comes after you, and you're no good at git, by all means, shit out 300 commits, don't squash, and have the PR message be totally useless. Also believe you're hot shit after one semester of boot camp and that no one else cares just because you want to go home, get high, and play xbox. (Not remotely saying that's you, but those people are out there.)
We could get all philosophical and try and answer the question of if there are any universal truths, nevermind universally best git commit practices.
I don't work where you work, don't know your team, or anybody's skill levels on it, so I'll just close with a couple thoughts. the tool is there to work for you, so learn to work with it not against it. Git bisect is your friend. And that it really sucks 4 years later to be confronted by totally useless commit messages on inappropriately sized commits (too big or too small) and have to guess at things in order to ship fixes to prod (on the prototype that was supposed to get thrown away but never did) and just hope and pray that you've guessed correctly.
Small commits where each commit represents nothing of value and doesn't compile are terrible for git bisect. And for code review.
A code reviewer doesn't care that you spent 10 days across 100 commits tweaking some small piece of code, the code reviewer cares about what you ended up with.
The bisecter probably wants to compile the code at every commit to try and see if they can reproduce a bug. Commits which don't compile force the bisecter to step through commits one by one until they find one which compiles, linearizing a process which should be logarithmic.
> having to come up with a fancy message
Message doesn’t need to be fancy, but it should describe what you did. Being unable to articulate your actions is a thought smell. It’s often seen when the developer is trying stuff until it sticks and needs a commit because the only way to test the fix is in a testing environment, two bad practices.
Do you frequently dig in to new codebases? I do, and commits that contain a functional, complete idea with a descriptive commit message are immensely useful to me for understanding why the code is the way it is.
It makes it bitch to know why given change was made.
If you change 2 lines 8 times to check something just squash the commit, saves everyone the hassle
Plus they might have a tool to rewrite the commit history where the commit message is "wip" and the commit is older than $DATE.
The only sane thing a maintainer can do with something like this is squash it into one commit. So if you care about `git bisect` then you don't want this.
Why do I care? Interesting question. I'm generally a person who cares, I guess. In this specific case it seems analogous to a mechanic having a tidy workshop. Would you leave your bicycle with someone head to toe in grease and tools strewn all over the place? I wouldn't.
When I was a kid and my family first moved to the US my dad used to take me for walks. He didn't know anything about the country (couldn't speak English yet) so the only thing he could comment on was what he imagined the prices of all the houses were. Some people just feel the need to comment on something even when they don't understand anything.
The proper way to work with git: Commit like a madman on your private branch. Short messages, written in seconds, just to be able to remember what you were doing if you are interrupted and have to get back into your work later. If you have a CI pipeline, often you have to make small changes until it works, so no reason to bother with smart commit messages.
At some point, you will have something working that makes sense that clean up. Then use interactive rebase to create one or a few commits that "makes sense". What makes sense is one of these topics that could create a whole bike garage, but you and your team will have some agreement on it. One thing that I like is to keep pure refactorings by themselves. No one cares to review that you've changed typos in old variables names and things like that. If it's a separate commit, you can just skip over it.
Depending on if you are completely done or not, the resulting branch can be sent as a PR/MR. Make sure that all commits have a reason why the change was made. No reason to repeat what the code says or generate some AI slop message. Your knowledge of why a change was done in a certain way is the most valuable part.
Of course, this way of working fits my work, that is not cloud based in any way and with lots of legacy code. It creates git history that I would like to have if I have to take over old projects or if I have to run git bisect on an unfamiliar code base and figure out some obscure bug. You might have a completely different technology stack and business, where it makes sense to work in some other way with git.
There is no "proper" way to use git, there is only a proper way to interact with your fellow developers.
You can do whatever you like locally. That's the luxury of git: it's distributed so you always get your own playground. You can make commits with short names if you find that useful. Personally I prefer to track my progress using a todo system, so I don't need commits to tell me where I am. I use `stash` instead of committing broken stuff if I need to switch branches.
I've found the idea of "rebase later" always easier said than done. In my experience if you work like that you'll more often than not end up just squashing everything into one commit. I prefer to rebase as I go. I'll rebase multiple times a day. Rearranging commits, amending the most recent commit etc. Keeping on top of it is the best way to achieve success at the end. It's like spending that bit of extra time putting your tools away or sweeping the floor. It pays off in the long run.
and git commit --fixup git rebase -i --autosquash
Claude, look at this git history, analyse diffs and create an intelligent commit message to replace each commit message. Do a rebase to fix it all up.
Would you actually do that? It's information destruction. You can machine generate at any time, but you can only delete the human input once
> you can only delete the human input once
You can't lose anything as long as you have a pointer to it (which doubles as making it easy to find)No need to make a "backup" branch. Learn to trust the reflog.
That is like learn to trust the indestructibility of matter. I can still lose (not able to locate...) my keys even though they still exist!
My human inputs are usually commit messages like "awdjhwahdwadga" until I do the rebase at the end
? I assume you want to replace your jubberish messages with something more useful before pushing? It is only "destroying" https://xkcd.com/1296/ style crap? Code changes stay the same.
cringe
Literally no one looks through the individual commits in a PR that's gonna be squashed. I don't care if it's 10 or 10,000 - I'm always gonna review the full thing.
Plenty of people do. At least at my work (and yes we squash PRs too). For some changes it's an easy way to make review way more sane.
For an illustration of the scale of this, search GitHub for 'commit by commit': https://github.com/search?q=%22commit+by+commit%22&type=pull... (2M results)
> that's gonna be squashed
Isn't your interpretation backwards in some cases? What I mean, is that _because_ you see the intermediate commits are garbage, you _then_ decide not to review the individual commits (because you are interested in the contribution anyway).
I certainly do care for the hobby FOSS projects I maintain, and bad commit messages + mega-commits won't fly at my day job.
Squash-merging has the advantages of making 1 PR == one commit with the PR ID in the commit message, sure, but it's unfortunately promotes bad Git hygiene (and works around it)
You might be surprised. Yours sounds like the attitude of someone who has not had the luxury of reviewing well-constructed commits. PRs with intentional commits permit both faster and deeper reviews—but alas, not everyone is so respectful of their reviewers’ time and energy.
When intel did it, the pitchforks came out.
Nvidia seems to get a pass. Whys that?
It really depends on details.
If intentionally slowing non CUTLASS shaders, sure pitchfork time.
If it's an option that /technically/ breaks the CUDA shader compatibility contract, then enabling it in specific "known good" situations is just business as usual for GPU drivers.
That can be for all kinds of reasons - straightforward bugs or incomplete paths in the optimization implementation, the app not actually needing the stricter parts of the contract so can have a faster path, or even bugs in apps that need workarounds.
Though piggybacking into these without understanding can be extremely fragile - you don't know why they've limited it, and you run the risk of tripping over some situation that will simply fail, either with incorrect results or something like a crash. And possibly in rather unexpected, unpredictable situations.
Intel disabled optimisations when they detected they were running on their competitors hardware. The motivation was to make competitors compare badly in benchmarks.
Nvidia are disabling optimisations on their own hardware. The motivation appears to be related to these optimisations being unsafe to apply to general code.
Let's be clear here. Intel searched for "GenuineIntel" and ran optimized code on their hardware.
nVidia got their pitchforks back in 2003: https://web.archive.org/web/20051218120547/http://techreport...
And again in 2010, although as far as I'm aware this was just based on speculation and it was never proved that it was intentional, or that the optimisation would have netted the gains the author said: https://web.archive.org/web/20250325144612/https://www.realw...
This isn't the same thing
Intel did this to consumers, nvidia does this to enterprises.