Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it

(github.com)

317 points | by mmastrac 16 hours ago ago

139 comments