Always been very interested in audio-reactive led strips or led bulbs, I've been using a Windows app to control my LIFX lights for years but lately it hasn't been maintained and it won't connect to my lights anymore.
I tried recreating the app (and I can connect via BT to the lights) but writing the audio-reactive code was the hardest part (and I still haven't managed to figure out a good rule of thumb or something).
I mainly use it when listening to EDM or club music, so it's always a classic 4/4 110-130bpm signature, yet it's hard to have the lights react on beat.
The mel spectrum is the first part of a speech recognition pipeline...
But perhaps you'd get better results if more of a ML speech/audio recognition pipeline were included?
Eg. the pipeline could separate out drum beats from piano notes, and present them differently in the visualization?
An autoencoder network trained to minimize perceptual reconstruction loss would probably have the most 'interesting' information at the bottleneck, so that's the layer I'd feed into my LED strip.
I was playing around with this recently, but the problem I encountered is that most AI analysis techniques like stem separation aren't built to work in real-time.
Had a similar setup based on an Arduino, 3 hardware filters (highs/mids/lows) for audio and a serial connection. Serial was used to read the MIDI clock from a DJ software.
This allowed the device to count the beats, and since most modern EDM music is 4/4 that means you can trigger effects every time something "changes" in the music after synching once.
More than 20 years ago or so I made a small LED display that used a series of LM567 (frequency detection ICs) and LM3914 (bar chart drivers) to make a simple histogram for music.
It was fiddly, and probably too inaccurate for a modern audience but I can't claim it was diabolically hard. Tuning was a faff but we were more willing to sit and tweak resistor and capacitor values then.
IANAE but I would go for electric circuit, not electronic software that steers the led. I think that nowadays, with the LLM support it can be easier and better to optimise it for the sake of latency.
If you want minimum latency, you want the input side of an traditional vocoder, not an FFT. This is the part that splits the modulator signal into frequency bands and puts each one through an envelope follower. Instead of using the outputs of the envelope followers to modulate the equivalent frequency bands of a carrier signal, you can use them to drive the visualizer circuit.
That can be done with analog electronics, but even half an analog vocoder needs a lot of parts. It's going to be cheaper and more reliable to simulate it in software. This uses entirely IIR filters, which are computationally cheap and calculated one sample at a time, so they have the minimum possible latency. I'd be curious if any LLM actually recognizes that an audio visualizer is half a vocoder instead of jumping straight to the obvious (and higher latency) FFT approach.
Interesting. I'm currently in the process of building something with a audio reactive LED strip but didn't come across this project yet.
The WLED [1] ESP32 firmware seems to be able to do something similar or potentially more though.
Having a library of tranform modules made it easy to transform the source data into the right shape before visual output. To react to bass, I might filter-in some of the lower frequencies, and then run it through a smoothing function with a high 'rise' and a low 'decay' (pops up quickly, falls slowly). The same metics can be piped in to color/gradient generation too.
Scott's work is amazing.
Another related project that builds on a similar foundation: https://github.com/ledfx/ledfx
Claude and I have been having fun doing something similar.
Are you using multiple accounts to post the same comment?!
I made a decent audio visualizer using the MSGEQ7 [1]. It buckets a count for seven audio frequency ranges—an Arduino would poll on every loop.
(And it looks like the 7 frequencies are not distributed linearly—perhaps closer to the mel scale.)
I tried using one of the FFT libraries on the Arduino directly but had no luck. The MSGEQ7 chip is nice.
[1] https://cdn.sparkfun.com/assets/d/4/6/0/c/MSGEQ7.pdf
Have you ever seen anything like a MSGEQ14 or equivalent? It would be cool to go beyond 7 in such a simple-to-use chip, but I haven't seen one.
Always been very interested in audio-reactive led strips or led bulbs, I've been using a Windows app to control my LIFX lights for years but lately it hasn't been maintained and it won't connect to my lights anymore.
I tried recreating the app (and I can connect via BT to the lights) but writing the audio-reactive code was the hardest part (and I still haven't managed to figure out a good rule of thumb or something). I mainly use it when listening to EDM or club music, so it's always a classic 4/4 110-130bpm signature, yet it's hard to have the lights react on beat.
The mel spectrum is the first part of a speech recognition pipeline...
But perhaps you'd get better results if more of a ML speech/audio recognition pipeline were included?
Eg. the pipeline could separate out drum beats from piano notes, and present them differently in the visualization?
An autoencoder network trained to minimize perceptual reconstruction loss would probably have the most 'interesting' information at the bottleneck, so that's the layer I'd feed into my LED strip.
I was playing around with this recently, but the problem I encountered is that most AI analysis techniques like stem separation aren't built to work in real-time.
Had a similar setup based on an Arduino, 3 hardware filters (highs/mids/lows) for audio and a serial connection. Serial was used to read the MIDI clock from a DJ software.
This allowed the device to count the beats, and since most modern EDM music is 4/4 that means you can trigger effects every time something "changes" in the music after synching once.
"3 hardware filters…"
The classic "Color Organ" from the 70's.
More than 20 years ago or so I made a small LED display that used a series of LM567 (frequency detection ICs) and LM3914 (bar chart drivers) to make a simple histogram for music.
It was fiddly, and probably too inaccurate for a modern audience but I can't claim it was diabolically hard. Tuning was a faff but we were more willing to sit and tweak resistor and capacitor values then.
IANAE but I would go for electric circuit, not electronic software that steers the led. I think that nowadays, with the LLM support it can be easier and better to optimise it for the sake of latency.
If you want minimum latency, you want the input side of an traditional vocoder, not an FFT. This is the part that splits the modulator signal into frequency bands and puts each one through an envelope follower. Instead of using the outputs of the envelope followers to modulate the equivalent frequency bands of a carrier signal, you can use them to drive the visualizer circuit.
That can be done with analog electronics, but even half an analog vocoder needs a lot of parts. It's going to be cheaper and more reliable to simulate it in software. This uses entirely IIR filters, which are computationally cheap and calculated one sample at a time, so they have the minimum possible latency. I'd be curious if any LLM actually recognizes that an audio visualizer is half a vocoder instead of jumping straight to the obvious (and higher latency) FFT approach.
For recorded music, you could always buffer however many milliseconds of audio to account for the processing.
Are these available commercially for consumers?
Interesting. I'm currently in the process of building something with a audio reactive LED strip but didn't come across this project yet. The WLED [1] ESP32 firmware seems to be able to do something similar or potentially more though.
[1] https://kno.wled.ge/
Edit: Oh wait, that project needs a PC or Raspberry PI for audio processing. WLED does everything on the ESP32.
WLED is decent but tbh the lag is very noticeable. Did you compare to this python thing?
Yeah WLED does it fine, I've built a few and it works well.
Claude and I have been having fun doing something similar.
I wanted to do something simliar to how SignalRGB does LED visualization:
Iterated until I arrived at a kind of patch-panel of interoperable modules: Having a library of tranform modules made it easy to transform the source data into the right shape before visual output. To react to bass, I might filter-in some of the lower frequencies, and then run it through a smoothing function with a high 'rise' and a low 'decay' (pops up quickly, falls slowly). The same metics can be piped in to color/gradient generation too.