As someone who implemented some RL algorithms and applied them to a real world game, (including all the ones mentioned in the article), I would be surprised if the implementation is not buggy. That is one of the most striking things about RL, the extent to which it is hard to find bugs, since they generally only degrade the performance instead of causing a crash or obviously wrong behavior. The fact that he doesn't mention a massive amount of time spent debugging, and the longish list of things that were tried that really should have worked but didn't, suggests to me it's probably still buggy. I suppose it is possible that LLMs could be particularly good at RL code since it's seen it repeated so many times... But I would be skeptical without hard evidence.
I accepted the bugginess in the browser game as unavoidable, and probably had too much faith in the LLM implementations, but I did a bit more troubleshooting than mentioned. The progressive improvement over episodes (and intuitively that PPO > the others) gave me some confidence, and I've since used a similar setup on 2048 with more results showing improvement over episode: https://wandb.ai/noahpunintended/2048-raspberry-pi?nw=nwuser...
Yeah essentially this. The irony of about wanting to obscure information by submitting it to a model API isn't lost on me, but it was the easiest way I could think of. Wanted some way of making the most key content in my picture to be the only thing unblurred
Blurring is never the solution as it can be unblurred in most cases (look up Mr whirlwind). Also Gemini sounds like overkill for the task of burring in general. Inkscape and gimp can do it for free (providing that you have a computer, not an iPad for example)
It caught my eye also but the article was interesting so I'll forgive OP :-)
On the topic of tamagotchi, if you happen to have a flipper zero there is emulator for it :-) my kid enjoyed it for while and it saved me a few bucks from having to buy one.
I had this idea during the pandemic 5 years ago now, and even did some of that work to figure out the variables I'd need to extract to make it work, but I never found the time/motivation to work on it for real. Really happy to see someone put in the effort.
The sample efficiency of the RL algorithm, even for simple games, is not very good. This usually means that we will need a lot of episodes for the policy to learn to excel. Being able to run policy in an environment that can parallel and accelerate could be very helpful for the improvement - for example running a batch of browsers or tabs simultaneously :)
Corporate firewall is blocking this since its a "newly registered domain" but I wanted to note that Tamagotchi got me to revisit Digimon after I learned that Digimon was created as a way for Bandai to sell to boys. Color me surprised when I learned that Tamagotchi was considered for girls, but I played with mine like there was no tomorrow, with the Pokemon hype of the late 90s it came to many of us at the right time.
Surprisingly, and I just looked it up, you can buy the original classic ones for about $20 straight off Amazon.
As someone who implemented some RL algorithms and applied them to a real world game, (including all the ones mentioned in the article), I would be surprised if the implementation is not buggy. That is one of the most striking things about RL, the extent to which it is hard to find bugs, since they generally only degrade the performance instead of causing a crash or obviously wrong behavior. The fact that he doesn't mention a massive amount of time spent debugging, and the longish list of things that were tried that really should have worked but didn't, suggests to me it's probably still buggy. I suppose it is possible that LLMs could be particularly good at RL code since it's seen it repeated so many times... But I would be skeptical without hard evidence.
I accepted the bugginess in the browser game as unavoidable, and probably had too much faith in the LLM implementations, but I did a bit more troubleshooting than mentioned. The progressive improvement over episodes (and intuitively that PPO > the others) gave me some confidence, and I've since used a similar setup on 2048 with more results showing improvement over episode: https://wandb.ai/noahpunintended/2048-raspberry-pi?nw=nwuser...
> (Sent through Gemini to blur my monitor).
Excuse me, what?
The image is watermarked from gemini, so presumably the author was trying to allay concerns that the important content was fake.
Yeah essentially this. The irony of about wanting to obscure information by submitting it to a model API isn't lost on me, but it was the easiest way I could think of. Wanted some way of making the most key content in my picture to be the only thing unblurred
That doesn't answer the question of why you would use an LLM to blue your monitor when there are a thousand ways to do it yourself
Honestly, if these image models still use diffusion with random seeds at their core, it might be actually more secure than blurring it yourself.
The same reason today's inexperienced programmers depend totally on NextTailVibeJSFlare. It's all they know.
[dead]
Because bluing yourself is messy.
https://www.youtube.com/watch?v=9GYtgFdXCGE
Why not? You send the picture and ask to blur the monitor in plain text. It gives you back the picture with a blurred monitor.
That seems like a very easy way to do the job. What's the issue specifically?
Clicked for Tamagotchi, but I saw none. My day is ruined. :'c
It caught my eye also but the article was interesting so I'll forgive OP :-)
On the topic of tamagotchi, if you happen to have a flipper zero there is emulator for it :-) my kid enjoyed it for while and it saved me a few bucks from having to buy one.
https://github.com/GMMan/flipperzero-tamagotch-p1
You can run it on tama-p1 on other platforms also but the flipper was very reminiscent of the original one.
I'm sorry for the clickbait
I had this idea during the pandemic 5 years ago now, and even did some of that work to figure out the variables I'd need to extract to make it work, but I never found the time/motivation to work on it for real. Really happy to see someone put in the effort.
The sample efficiency of the RL algorithm, even for simple games, is not very good. This usually means that we will need a lot of episodes for the policy to learn to excel. Being able to run policy in an environment that can parallel and accelerate could be very helpful for the improvement - for example running a batch of browsers or tabs simultaneously :)
Corporate firewall is blocking this since its a "newly registered domain" but I wanted to note that Tamagotchi got me to revisit Digimon after I learned that Digimon was created as a way for Bandai to sell to boys. Color me surprised when I learned that Tamagotchi was considered for girls, but I played with mine like there was no tomorrow, with the Pokemon hype of the late 90s it came to many of us at the right time.
Surprisingly, and I just looked it up, you can buy the original classic ones for about $20 straight off Amazon.
I recently updated my .github.io to route to a domain name I purchased so that could be why it's getting blocked right now.
This comment alone has more tamagotchi lore than my post as a disclaimer in case I saved you a read haha.