Do you have a budget per-player of cloud usage? What happens if people really like the game and play it so much it starts getting expensive to keep running? I guess at $0.79 / Mtok llama70B is pretty affordable, but a per-player opex seems hard to handle without a subscription model.
This is fantastic. I think it’s nailed in the substack what was missing from a lot of these LLM driven NPCs that did not feel authentic. I have a couple of follow-up questions on specifics relating to analysis of behaviour with LLMs (in game-dev myself). Would it be possible to speak to you directly on them?
Hey! Robotopia looks awesome, I'm excited to try it out when it launches. How do you convert the LLM output to actions? Is there more broad actions available (ie like creating any object, moving anything anywhere) exposed to the LLM or is it more specific tools it can call?
Thanks :)
It may sound insane but we convert actions to Python functions then ask the LLM to write a python script that actually runs in IronPython inside the game.
Then we have a visual Behavior Tree system to let our designer define the actions. So yeah, they got a bunch of general actions like walk, talk, follow, interact etc.
PS: I think MCP/Tool Calls are a boondoggle and LLMs yearn to just run code. It's crazy how much better this works than JSON schema etc.
The "big" one is Llama3.3-70b on the cloud, right now. On GroqCloud in fact, but we have a cloud router that gives us several backups if Groq abandoned us.
We use a ton of smaller models (embeddings, vibe checks, TTS, ASR, etc) and if we had enough scale we'll try to run those locally for users that have big enough GPUs.
(You mean the voxel grid visibility from 2014?! I'm sure I did at the time... but I left MC in 2020 so don't even remember my own algorithm right now)
Yeah it's extremely difficult right now, especially for a Windows game that can't have players install Pytorch and the Cuda Toolkit!
ONNX and DirectML seem sort of promising right now, but it's all super raw.
Even if that worked, local models are bottlenecked by VRAM and that's never been more expensive. And we need to fit 6gb of game in there as well.
Even if _that_ worked, we'd need to timeslice the compute inside the frame so that the game doesn't hang for 1 second.
And then we'd get to fight every driver in existence :)
Basically it's just not possible unless you have a full-time expert dedicated to this IMO. Maybe it'll change!
About the voxel visibility: yeah that was awesome, I remember :) Long story short MC is CPU-bound and the frustum clippings' CPU cost didn't get paid off by the reduced overdraw, so it wasn't worth it. Then a guy called Jonathan Hoof rewrote the entire thing to be separated in a 360° scan done on another thread when you changed chunk + a in-frustum walk that worked completely differently, and I don't remember the details but it did fix the ravine issue entirely!
GGML still runs on llama.cpp, and that still requires CUDA to be installed, unfortunately. I saw a PR for DirectML, but I'm not really holding my breath.
It's a stylistic choice for sure. A little better than that is straight in uncanny valley, and human-level is too high latency and too expensive for us.
We found that this level of crappy works great, in practice, plus it runs on-device! We use Rhasspy Piper to generate them.
I would personally avoid voices that skew too close to common tiktok TTS ai. Currently the heavy robots with the lower bassier voices sell that clunky robot voice vibe much better, but some of the more generic voices immediately take me out.
Unfortunately, they are close because some of them ARE tiktok AI voices you heard! I'm working on hiring VAs to make custom datasets, though. We'll have our own unique voices by 1.0 for sure.
We want to gamify prompt hacking and give people an UI to add/remove chunks of the system prompt. It'll be unlocked by collecting widgets around the place.
Another game that has LLM powered NPCs is the f2p action game from China called "Where Winds Meet" and players came up with all sorts of hilarious ways to cheat quests and other fun stuff via prompt injections.
Hey, Tommaso here, I'm one of the founders of the Robotopia studio. I didn't expect to see this here! Ask me anything :)
Do you have a budget per-player of cloud usage? What happens if people really like the game and play it so much it starts getting expensive to keep running? I guess at $0.79 / Mtok llama70B is pretty affordable, but a per-player opex seems hard to handle without a subscription model.
This is fantastic. I think it’s nailed in the substack what was missing from a lot of these LLM driven NPCs that did not feel authentic. I have a couple of follow-up questions on specifics relating to analysis of behaviour with LLMs (in game-dev myself). Would it be possible to speak to you directly on them?
Thanks :) If you want I'm on the discord linked on our landing page, it's fun stuff to talk about!
Amazing! Thanks will join.
Hey! Robotopia looks awesome, I'm excited to try it out when it launches. How do you convert the LLM output to actions? Is there more broad actions available (ie like creating any object, moving anything anywhere) exposed to the LLM or is it more specific tools it can call?
Thanks :) It may sound insane but we convert actions to Python functions then ask the LLM to write a python script that actually runs in IronPython inside the game. Then we have a visual Behavior Tree system to let our designer define the actions. So yeah, they got a bunch of general actions like walk, talk, follow, interact etc.
PS: I think MCP/Tool Calls are a boondoggle and LLMs yearn to just run code. It's crazy how much better this works than JSON schema etc.
Are the LLMs run on-device, or does this use cloud compute?
(Off-topic AMA question: Did you see my voxel grid visibility post?)
The "big" one is Llama3.3-70b on the cloud, right now. On GroqCloud in fact, but we have a cloud router that gives us several backups if Groq abandoned us.
We use a ton of smaller models (embeddings, vibe checks, TTS, ASR, etc) and if we had enough scale we'll try to run those locally for users that have big enough GPUs.
(You mean the voxel grid visibility from 2014?! I'm sure I did at the time... but I left MC in 2020 so don't even remember my own algorithm right now)
Shipping GPU-accelerated ML models in games looks difficult, are there any major examples other than vendor-locked upscaling like DLSS or FSR?
(Yep! https://cod.ifies.com/voxel-visibility/ )
Yeah it's extremely difficult right now, especially for a Windows game that can't have players install Pytorch and the Cuda Toolkit!
ONNX and DirectML seem sort of promising right now, but it's all super raw. Even if that worked, local models are bottlenecked by VRAM and that's never been more expensive. And we need to fit 6gb of game in there as well. Even if _that_ worked, we'd need to timeslice the compute inside the frame so that the game doesn't hang for 1 second. And then we'd get to fight every driver in existence :) Basically it's just not possible unless you have a full-time expert dedicated to this IMO. Maybe it'll change!
About the voxel visibility: yeah that was awesome, I remember :) Long story short MC is CPU-bound and the frustum clippings' CPU cost didn't get paid off by the reduced overdraw, so it wasn't worth it. Then a guy called Jonathan Hoof rewrote the entire thing to be separated in a 360° scan done on another thread when you changed chunk + a in-frustum walk that worked completely differently, and I don't remember the details but it did fix the ravine issue entirely!
GGML is another neat ML abstraction layer, but I don't think much work has been dedicated to the Windows port.
GGML still runs on llama.cpp, and that still requires CUDA to be installed, unfortunately. I saw a PR for DirectML, but I'm not really holding my breath.
I like the concept. Though, they couldn't have found better text-to-speech voices? Or is it meant to be humorous how bad they are.
It's a stylistic choice for sure. A little better than that is straight in uncanny valley, and human-level is too high latency and too expensive for us. We found that this level of crappy works great, in practice, plus it runs on-device! We use Rhasspy Piper to generate them.
I would personally avoid voices that skew too close to common tiktok TTS ai. Currently the heavy robots with the lower bassier voices sell that clunky robot voice vibe much better, but some of the more generic voices immediately take me out.
Unfortunately, they are close because some of them ARE tiktok AI voices you heard! I'm working on hiring VAs to make custom datasets, though. We'll have our own unique voices by 1.0 for sure.
I’m so excited to see LLMs used more creatively in video games. So many new mechanics can be unlocked with LLMs as judges
Agreed!
Some other cool ones I've seen: https://store.steampowered.com/app/2542850/1001_Nights/ https://www.playsuckup.com/
Robotopia was very inspired by suck up. First LLM game that kinda cracked the 3d world
I'm imagining a version of this where you have to use various prompt- or data-centric attacks to navigate scenarios
We want to gamify prompt hacking and give people an UI to add/remove chunks of the system prompt. It'll be unlocked by collecting widgets around the place.
Another game that has LLM powered NPCs is the f2p action game from China called "Where Winds Meet" and players came up with all sorts of hilarious ways to cheat quests and other fun stuff via prompt injections.
https://www.dexerto.com/gaming/where-winds-meet-players-are-...
https://www.rockpapershotgun.com/where-winds-meet-player-con...
I had no idea this game had LLM NPCs. Interesting