It's a very nice write-up, but this part makes me uneasy:
> So long as all the computation in the loop finishes before the next quantum, the timing requirements [...] are met.
Seems like we are back to cycle counting then? but instead of having just 32 1-IPC instructions, we have up to 4K instructions with various latency, and there is C compiler too, so even if you had enough cycles in budget now, the things might break when compiler is upgraded.
I am wondering if the original PIO approach was still salvageable if the binary compatibility is not a goal. Because while co-processors are useful, people did some amazing things with PIO, like fully software DVI.
> Here, we leverage the “quantum” feature to get exact pulse timings without resorting to cycle-counting
This is just a hard-real-time constraint that already exists in today’s computers and other devices.
For example: Audio playback and processing are a day-to-day operations where hard-real-time guarantees are necessary for uninterrupted playback, and every digital audio device already conforms to it. If the buffer is too slow you get playback errors.
>The BIO uses 14597 cells, while the PIO uses 39087 cells
and BIO might reach higher clock speeds
> when ported to an ASIC flow, the clock rate achieved by the BIO is over 4x that of a PIO implemented in the same process node.
but BIO is ~15x less efficient per clock. RP2350 is capable of reading IOs at 400Mbps (https://github.com/gusmanb/logicanalyzer) and bitbanging at 800 Mbps (HSTX). From Bunnie writeup BIO needs 700MHz to do pedestrian 25Mhz SPI.
Really glad to get this write-up, adds a very nice broad picture & does a good job introducing the queue too.
I'm an unranked unwashed neophyte at hardware design, but I did spend some time looking at BIO. One particular thing that caught my eye a while ago was Streaming Semantic Registers, which is an instruction set extension for risc-v where load and store are implicit, with data pointers that automatically walk on each instruction. This greatly increases code density, allowing for DSP like capabilities on risc-v.
https://arxiv.org/abs/1911.08356
I forget how exactly I was convinced, but after spending a while chatting with the LLM, I became somewhat convinced that the FIFO queues here gave a lot of similar capabilities. With additional interesting use for decoupling multiple systems. Register mapped data arrays, that can be used without having to load/store each word. I felt then and felt now that I still have a good bit to learn about how exactly each of the FIFO registers works, but it was cool to see, and I love this idea of code that can run without having to issue endless load/stores all the time.
Bunnie also wrote up a post about it on his blog: https://www.bunniestudios.com/blog/2026/bio-the-bao-i-o-copr...
It's a very nice write-up, but this part makes me uneasy:
> So long as all the computation in the loop finishes before the next quantum, the timing requirements [...] are met.
Seems like we are back to cycle counting then? but instead of having just 32 1-IPC instructions, we have up to 4K instructions with various latency, and there is C compiler too, so even if you had enough cycles in budget now, the things might break when compiler is upgraded.
I am wondering if the original PIO approach was still salvageable if the binary compatibility is not a goal. Because while co-processors are useful, people did some amazing things with PIO, like fully software DVI.
The previous sentence already answers this:
> Here, we leverage the “quantum” feature to get exact pulse timings without resorting to cycle-counting
This is just a hard-real-time constraint that already exists in today’s computers and other devices.
For example: Audio playback and processing are a day-to-day operations where hard-real-time guarantees are necessary for uninterrupted playback, and every digital audio device already conforms to it. If the buffer is too slow you get playback errors.
PIOs might be heavier on hardware resources
>The BIO uses 14597 cells, while the PIO uses 39087 cells
and BIO might reach higher clock speeds
> when ported to an ASIC flow, the clock rate achieved by the BIO is over 4x that of a PIO implemented in the same process node.
but BIO is ~15x less efficient per clock. RP2350 is capable of reading IOs at 400Mbps (https://github.com/gusmanb/logicanalyzer) and bitbanging at 800 Mbps (HSTX). From Bunnie writeup BIO needs 700MHz to do pedestrian 25Mhz SPI.
dupe of https://news.ycombinator.com/item?id=47459363 ?
Really glad to get this write-up, adds a very nice broad picture & does a good job introducing the queue too.
I'm an unranked unwashed neophyte at hardware design, but I did spend some time looking at BIO. One particular thing that caught my eye a while ago was Streaming Semantic Registers, which is an instruction set extension for risc-v where load and store are implicit, with data pointers that automatically walk on each instruction. This greatly increases code density, allowing for DSP like capabilities on risc-v. https://arxiv.org/abs/1911.08356
I forget how exactly I was convinced, but after spending a while chatting with the LLM, I became somewhat convinced that the FIFO queues here gave a lot of similar capabilities. With additional interesting use for decoupling multiple systems. Register mapped data arrays, that can be used without having to load/store each word. I felt then and felt now that I still have a good bit to learn about how exactly each of the FIFO registers works, but it was cool to see, and I love this idea of code that can run without having to issue endless load/stores all the time.