Minkowsky, cool design! Question - the ASIC designers I've worked with over the years have been fairly adamant that integrating memory on package interspersed with logic is very difficult; the general statements run like "those designs always look great on paper, but never tape out properly".
Have you done any hardware tests of this plan? Is this still considered quality advice?
Second q, why start with 28nm? Is the idea that you want to stick with TSMC and be able to shrink? If this does in fact work well, I can imagine wanting to shoot for a smaller process node pretty quickly. Is there some sort of tech / design gap you'll need to figure out as you go?
Due to the thermal budget, most of the silicon design is constrained to a 2D layout. So the Memory is competing with logic for layout. Now we stack logic in the backend between metals.
We fabricated 2T0C DRAM arrays with a 3D monolithic structure. That's a must-do.
isn't cerebras the pudding proof of this design? it seems like ai chips galore are appearing from the woodwork but cerebras is 10 years down this rabbit hole and poised to dominate
Since when are we doing 32-layer planar transistor logic on a single chip? Even ignore the use of FETs for eDRAM… I didn’t realize we had decent logic density possible on BEOL.
Author here. Thanks! Short version: Cerebras and we are attacking the same memory wall from opposite axes — they scale out in 2D, we scale up in 3D.
Cerebras WSE-3 is a brilliant packaging play: one wafer-scale chip (~46,000 mm², ~900k cores) with ~44 GB of SRAM spread across the plane, so compute and memory sit side by side with enormous bandwidth. The catch is density — SRAM is a 6T cell, so even a whole wafer only holds ~44 GB. An 80B model doesn't fit on-wafer, so weights stream in from external MemoryX (off-wafer DRAM). It's fast, but it's a ~23 kW, multi-million-dollar system, and large models are still memory-streamed.
Sophon is a single ~750 mm² die. Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic — 64 monolithic 3D tiers of 2D-TMD compute-in-memory and capacitor-less gain-cell DRAM. The gain cell is denser than SRAM per layer, and we stack 32 memory tiers of it, so we get 330 GB on one normal-size die — enough that an 80B model is fully resident, no streaming, no off-chip memory at all. ~1 kW, not 23 kW.
So the real difference is SRAM-in-2D vs DRAM-in-3D: Cerebras maxes out planar SRAM area; we trade to denser DRAM and stack it vertically, which is what buys GB-scale on-die capacity.
Honest caveat: Cerebras ships real silicon today and is genuinely fast — they proved wafer-scale integration works. We're pre-silicon, betting on a harder materials path (2D-TMD monolithic 3D). The upside, if it yields, is capacity-per-watt and per-dollar that planar SRAM can't reach.
I've been wondering how long before RAM is fabbed on die to get around supply issues. This is one of the first I've read of so far. How long before Apple releases a CPU with ram on die?
Minkowsky, cool design! Question - the ASIC designers I've worked with over the years have been fairly adamant that integrating memory on package interspersed with logic is very difficult; the general statements run like "those designs always look great on paper, but never tape out properly".
Have you done any hardware tests of this plan? Is this still considered quality advice?
Second q, why start with 28nm? Is the idea that you want to stick with TSMC and be able to shrink? If this does in fact work well, I can imagine wanting to shoot for a smaller process node pretty quickly. Is there some sort of tech / design gap you'll need to figure out as you go?
Due to the thermal budget, most of the silicon design is constrained to a 2D layout. So the Memory is competing with logic for layout. Now we stack logic in the backend between metals.
We fabricated 2T0C DRAM arrays with a 3D monolithic structure. That's a must-do.
Why 28nm? Because it's cheap, widely available, and already gives us enough performance to beat Nvidia Vera Rubin. We have a road map, scaling it down. https://www.phantafield.com/whitepaper#6-scaling-roadmap
isn't cerebras the pudding proof of this design? it seems like ai chips galore are appearing from the woodwork but cerebras is 10 years down this rabbit hole and poised to dominate
Since when are we doing 32-layer planar transistor logic on a single chip? Even ignore the use of FETs for eDRAM… I didn’t realize we had decent logic density possible on BEOL.
This design is absolutely wild. It probably won't work but I admire the dream.
Author here. The economy is more realistic than the wafer-scale ASIC by Cerebras.
Can you explain why?
I have a detailed comparison with Cerebras in economic analysis: https://www.phantafield.com/whitepaper#7-economic-analysis
Hello, kudos for the tremendous work. Could you explain the difference between your design and Cerebras?
Bests
Author here. Thanks! Short version: Cerebras and we are attacking the same memory wall from opposite axes — they scale out in 2D, we scale up in 3D.
Cerebras WSE-3 is a brilliant packaging play: one wafer-scale chip (~46,000 mm², ~900k cores) with ~44 GB of SRAM spread across the plane, so compute and memory sit side by side with enormous bandwidth. The catch is density — SRAM is a 6T cell, so even a whole wafer only holds ~44 GB. An 80B model doesn't fit on-wafer, so weights stream in from external MemoryX (off-wafer DRAM). It's fast, but it's a ~23 kW, multi-million-dollar system, and large models are still memory-streamed.
Sophon is a single ~750 mm² die. Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic — 64 monolithic 3D tiers of 2D-TMD compute-in-memory and capacitor-less gain-cell DRAM. The gain cell is denser than SRAM per layer, and we stack 32 memory tiers of it, so we get 330 GB on one normal-size die — enough that an 80B model is fully resident, no streaming, no off-chip memory at all. ~1 kW, not 23 kW.
So the real difference is SRAM-in-2D vs DRAM-in-3D: Cerebras maxes out planar SRAM area; we trade to denser DRAM and stack it vertically, which is what buys GB-scale on-die capacity.
Honest caveat: Cerebras ships real silicon today and is genuinely fast — they proved wafer-scale integration works. We're pre-silicon, betting on a harder materials path (2D-TMD monolithic 3D). The upside, if it yields, is capacity-per-watt and per-dollar that planar SRAM can't reach.
> they scale out in 2D, we scale up in 3D.
This actually helps a lot, thanks.
> Instead of spreading SRAM across a wafer, we stack DRAM on top of the logic
Is this done with current manufacturing technologies? Does it require a special process?
> no streaming, no off-chip memory at all. ~1 kW, not 23 kW
Is this for an individual compute unit? Compared to Cerebras, what's the ratio of power used vs compute output?
I suspect you are being downvoted because your answer is AI-generated, but I found it very clear and will upvote.
What makes you think his reply was AI generated?
Edit: I can see a bunch of hints, most definitely. Still a good comment though.
I've been wondering how long before RAM is fabbed on die to get around supply issues. This is one of the first I've read of so far. How long before Apple releases a CPU with ram on die?
They're typically manufactured with very different processes so one has to wonder what compromises are being made here to get both on the same die.
What is this? AI generated company?
MoS2 lattice construction?