Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

(github.com)

36 points | by yu3zhou4 2 hours ago ago

3 comments