Astro - Hacker News

22 comments

noelwelsh 13 minutes ago ago

This is how many registers the ISA exposes, but not the number of registers actually in the CPU. Typical CPUs have hundreds of registers. For example, Zen 4 's integer register file has 224 registers, and the FP/vector register file has 192 registers (per Wikipedia). This is useful to know because it can effect behavior. E.g. I've seen results where doing register allocation with a large number of registers, followed by the number of registers exposed in the ISA, leads to better performance.
JonChesterfield 3 hours ago ago

Good post! Stuff I didn't know x64 has. Sadly doesn't answer the "how many registers are behind rax" question I was hoping for, I'd love to know how many outstanding writes one can have to the various architectural registers before the renaming machinery runs out and things stall. Not really for immediate application to life, just a missing part of my mental cost model for x64.
burnt-resistor 13 minutes ago ago

Conservatively though, another answer could be when not considering subset registers as distinct:
16 GP
2 state (flags + IP)
6 seg
4 TRs
11 control
32 ZMM0-31 (repurposes 8 FPU GP regs)
1 MXCSR
6 FPU state
28 important MSRs
7 bounds
6 debug
8 masks
8 CET
10 FRED
=========
145 total
And don't forget another 10-20 for the local APIC.
"The answer" depends upon the purpose and a specific set of optional extensions. Function call, task switching between processes in an OS, and emulation virtual machine process state have different requirements and expectations. YMMV.
Here's a good list for reference: https://sandpile.org/x86/initial.htm
1011101001000 29 minutes ago ago

x86-64 ISA general-purpose register containers: low-er 8 to 16 bits of the 64 bit GPR.
fuhsnn 3 hours ago ago

Intel's next gen will add 16 more general purpose registers. Can't wait for the benchmarks.
[-]
- vaylian 5 minutes ago ago
  
  Those general purpose registers will also need to grow to twice their size, once we get our first 128bit CPU architecture. I hope Intel is thinking this through.
- Joker_vD 2 hours ago ago
  
  So every function call will need to spill even more call-clobbered registers to the stack!
  Like, I get that leaf functions with truly huge computational cores are a thing that would benefit from more ISA-visible registers, but... don't we have GPUs for that now? And TPUs? NPUs? Whatever those things are called?
  [-]
  - cvoss 10 minutes ago ago
    
    With an increase in available registers, every value that a compiler might newly choose to keep in a register was a value that would previously have lived in the local stack frame anyway.
    It's up to the compiler to decide how many registers it needs to preserve at a call. It's also up to the compiler to decide which registers shall be the call-clobbered ones. "None" is a valid choice here, if you wish.
  - jandrewrogers 2 hours ago ago
    
    Most function calls are aggressively inlined by the compiler such that they are no longer "function calls". More registers will make that even more effective.
    
    [-]
    
    burnt-resistor 9 minutes ago ago
    
    That depends on if something like LTO is possible and a function isn't declared to use one of the plethora of calling conventions. What it means is that new calling conventions will be needed and that this new platform will be able to use pass by register for higher arity functions.
  - throwaway17_17 2 hours ago ago
    
    Why does having more more registers lead to spilling? I would assume (probably) incorrectly, that more registers means less spill. Are you talking about calls inside other calls which cause the outer scope arguments to be preemptively spilled so the inner scope data can be pre placed in registers?
    
    [-]
    
    BeeOnRope 34 minutes ago ago
    
    More registers leads to less spilling not more, unless the compiler is making some really bad choices.
    Any easy way to see that is that the system with more registers can always use the same register allocation as the one with fewer, ignoring the extra registers, if that's profitable (i.e. it's not forced into using extra caller-saved registers if it doesn't want to).
    
    Joker_vD an hour ago ago
    
    So, let's take a function with 40 alive temporaries at a point where it needs to call a helper function of, say, two arguments.
    On a 16 register machine with 9 call-clobbered registers and 7 call-invariant ones (one of which is the stack pointer) we put 6 temporaries into call-invariant registers (so there are 6 spills in the prologue of this big function), another 9 into the call-clobbered registers; 2 of those 9 are the helper function's arguments, but 7 other temporaries have to be spilled to survive the call. And the rest 25 temporaries live on the stack in the first place.
    If we instead take a machine with 31 registers, 19 being call-clobbered and 12 call-invariant ones (one of which is a stack pointer), we can put 11 temporaries into call-invariant registers (so there are 11 spills in the prologue of this big function), and another 19 into the call-clobbered registers; 2 of those 19 are the helper function's arguments, so 17 other temporaries have to be spilled to survive the call. And the rest of 10 temporaries live on the stack in the first place.
    So, there seems to be more spilling/reloading whether you count pre-emptive spills or the on-demand-at-the-call-site spills, at least to me.
    
    CamelCaseCondo 2 hours ago ago
    
    op is probably referring to the push all/pop all approach.
    
    [-]
    
    Joker_vD an hour ago ago
    
    No, I don't. I use a common "spill definitely reused call-invariant registers at the prologue, spill call-clobbered registers that need to survive a call at precisely the call site" approach, see the sibling comment for the arithmetic.
  - bjourne 18 minutes ago ago
    
    Most modern compilers for modern languages do an insane amount of inlining so the problem you're mentioning isn't a big issue. And, basically, GPUs and TPUs can't handle branches. CPUs can.
- BobbyTables2 an hour ago ago
  
  How are they adding GPRs? Won’t that utterly break how instructions are encoded?
  That would be a major headache — even if current instruction encodings were somehow preserved.
  It’s not just about compilers and assemblers. Every single system implementing virtualization has a software emulation of the instruction set - easily 10k lines of very dense code/tables.
  [-]
  - Joker_vD an hour ago ago
    
    The same way AMD added 8 new GPRs, I imagine: by introducing a new instruction prefix.
nefsim 2 hours ago ago

Even though this post is from 2020, it’s still a classic reference. It’s especially relevant now to revisit this baseline considering Intel’s APX which aims to double the GPRs to 32. Understanding how we got here is key to appreciating where the architecture is headed next.
[-]
- Sharlin 29 minutes ago ago
  
  One thing that has happened since 2020 is that recent AMD CPUs support AVX-512, so that raises the number of registers by 16+16+32.
sylware 3 hours ago ago

Don't forget x86_64 like ARM is IP-locked, RISC-V is not.
[-]
- dlcarrier 14 minutes ago ago
  
  Fun fact: the AMD64 patents have expired, with AMD-V patents expiring this year, so there really isn't a need for an x86 license to do anything useful. All that's still protected is various AVX instruction sets, but those are generally used in heavily optimized software, like emulators and video encoders, that tend to be compiled to the specific processor instruction set anyway.