3 points | by deevus 17 hours ago ago
2 comments
I also use self-hosted LLMs. You can make three GTX 1080s run a 7b model competently at limited context through ollama. Get a little more bold with LM studio and you can actually make a coherent and sort of reliable model.
on macOS if you opted for 32GB you can run a GPT4-oss model with LMStudio really easily.
It's "good enough" for a lot of questions and doesn't go up and down like a yoyo (OpenAI dashboard lies)
I also use self-hosted LLMs. You can make three GTX 1080s run a 7b model competently at limited context through ollama. Get a little more bold with LM studio and you can actually make a coherent and sort of reliable model.
on macOS if you opted for 32GB you can run a GPT4-oss model with LMStudio really easily.
It's "good enough" for a lot of questions and doesn't go up and down like a yoyo (OpenAI dashboard lies)