Astro - Hacker News

14 comments

zozbot234 2 hours ago ago

I'm not quite seeing the real benefit of this. Is the idea that warps will now be able to do work-stealing and continuation-stealing when running heterogenous parallel workloads? But that requires keeping the async function's state in GPU-wide shared memory, which is generally a scarce resource.
[-]
- jmalicki 25 minutes ago ago
  
  A ton of GPU workloads require leaving large amounts of RAM resident on the GPU and running computation with some new data from the CPU.
- LegNeato 2 hours ago ago
  
  Yes, that's the idea.
  GPU-wide memory is not quite as scarce on datacenter cards or systems with unified memory. One could also have local executors with local futures that are `!Send` and place in a faster address space.
xiphias2 20 minutes ago ago

Really cool experiment (the whole company).
Training pipelines are full of data preparation that are first written on CPU then moving to GPU and always thinking of what to keep on CPU and what to put on GPU, when is it worth to create a tensor, or should it be tiling instead. I guess your company is betting on solving problems like this (and async-await is needed for serving inference requests directly on the GPU for example).
My question is a little bit different: how do you want to handle the SIMD question: should a rust function be running on the warp as a machine with 32 long arrays as data types, or always ,,hope'' for autovectorization to work (especially with Rust's iter library helpers).
shayonj 2 hours ago ago

Very cool to see this and something I have been curious about myself and exploring the space as well. I'd be curious what are some parallels and differentiations between this and NVIDIA's stdexec (outside of it being in Rust and using Future, which is also cooL)
textlapse 2 hours ago ago

What's the performance like? What would the benefits be of converting a streaming multiprocessor programming model to this?
[-]
- LegNeato an hour ago ago
  
  We aren't focused on performance yet (it is often workload and executor dependent, and as the post says we currently do some inefficient polling) but Rust futures compile down to state machines so they are a zero-cost abstraction.
  The anticipated benefits are similar to the benefits of async/await on CPU: better ergonomics for the developer writing concurrent code, better utilization of shared/limited resources, fewer concurrency bugs.
  [-]
  - textlapse 44 minutes ago ago
    
    warp is expensive - essentially it's running a 'don't run code' to maintain SIMT.
    GPUs are still not practically-Turing-complete in the sense that there are strict restrictions on loops/goto/IO/waiting (there are a bunch of band-aids to make it pretend it's not a functional programming model).
    So I am not sure retrofitting a Ferrari to cosplay an Amazon delivery van is useful other than for tech showcase?
    Good tech showcase though :)
Arch485 an hour ago ago

Very cool!
Is the goal with this project (generally, not specifically async) to have an equivalent to e.g. CUDA, but in Rust? Or is there another intended use-case that I'm missing?
the__alchemist 28 minutes ago ago

Et tu, GPU?
I am, bluntly, sick of Async taking over rust ecosystems. Embedded and web/HTTP have already fallen. I'm optimistic this won't take hold in GPU; well see. Async splits the ecosystem. I see it as the biggest threat to Rust staying a useful tool.
I use rust on the GPU for the following: 3d graphics via WGPU, cuFFT via FFI, custom kernels via Cudarc, and ML via Burn and Candle. Thankfully these are all Async-free.
firefly2000 2 hours ago ago

Is this Nvidia-only or does it work on other architectures?
[-]
- LegNeato an hour ago ago
  
  Currently NVIDIA-only, we're cooking up some Vulkan stuff in rust-gpu though.
  [-]
  - monster_truck an hour ago ago
    
    I don't have anything to offer but my encouragement, but there are _dozens_ of ROCm enjoyers out there.
    In years prior I wouldn't have even bothered, but it's 2026 and AMD's drivers actually come with a recent version of torch that 'just works' on windows. Anything is possible :)
  - firefly2000 39 minutes ago ago
    
    Does the lack of forward progress guarantees (ITS) on other architectures pose challenges for async/await?