Lovely visualization. I like the very concrete depiction of middle layers "recognizing features", that make the whole machine feel more plausible. I'm also a fan of visualizing things, but I think its important to appreciate that some things (like 10,000 dimension vector as the input, or even a 100 dimension vector as an output) can't be concretely visualized, and you have to develop intuitions in more roundabout ways.
I hope make more of these, I'd love to see a transformer presented more clearly.
Great explanation, but the last question is quite simple. You determine the weights via brute force. Simply running a large amount of data where you have the input as well as the correct output (handwriting to text in this case).
"Brute force" would be trying random weights and keeping the best performing model. Backpropagation is compute-intensive but I wouldn't call it "brute force".
Lovely visualization. I like the very concrete depiction of middle layers "recognizing features", that make the whole machine feel more plausible. I'm also a fan of visualizing things, but I think its important to appreciate that some things (like 10,000 dimension vector as the input, or even a 100 dimension vector as an output) can't be concretely visualized, and you have to develop intuitions in more roundabout ways.
I hope make more of these, I'd love to see a transformer presented more clearly.
The original Show HN, https://news.ycombinator.com/item?id=44633725
This visualizations reminds me of the 3blue1brown videos.
I was thinking the same thing. Its at least the same description.
I like the style of the site it has a "vintage" look
Don't think it's moire effect but yeah looking at the pattern
For the visual learners, here's a classic intro to how LLMs work: https://bbycroft.net/llm
This is just scratching the surface -- where neural networks were thirty years ago: https://en.wikipedia.org/wiki/MNIST_database
If you want to understand neural networks, keep going.
Great explanation, but the last question is quite simple. You determine the weights via brute force. Simply running a large amount of data where you have the input as well as the correct output (handwriting to text in this case).
"Brute force" would be trying random weights and keeping the best performing model. Backpropagation is compute-intensive but I wouldn't call it "brute force".
"Brute force" here is about the amount of data you're ingesting. It's no Alpha Zero, that will learn from scratch.
very cool stuff