Mind you, all data visible is collected from different reputable available sources. When you click "explain" there's LLM explanation, but my explanation generation pipeline, pushed all generated explanations through 5 different models including top Chinese for verification, and on average it took a few runs to iron out any information that could potentially mislead the learner.
Note on why this person is taking an unusual route: from https://blog.kevinzwu.com/cyborg-learning/ , they are a "second generation Chinese immigrant" and "heritage speaker"; that is, they live outside China, can speak the language because they learned it from their parents, but cannot read it.
Edit addendum: https://blog.kevinzwu.com/chinese-cursed-logographic-dags/ is a fun read. I've been using the imaginatively titled "kanji study" app, which uses the same Outlier database mentioned which has the graph based etymology.
There's an additional level of chaos when learning the "same" characters as kanji rather than hanzi.
I liked the 10% @@@ example, demonstrated their point pretty well.
Also for anyone who speaks or is currently learning Chinese... I've been working on a multiplayer CJK word game that shares a similar efficient brute force style of learning to the author's approach (although presented via gameplay instead of tooling). Every turn you get a random character and must type in a word that contains the char in ANY position. If you like fast paced word games it might be up your alley: https://danobang.com/?game_lang=cmn
> I opened Claude Code and started rambling into my mic. It wrote thousands of lines of questionably efficient JavaScript. I didn't read a single one.
Hm. I always knew voice mode was a thing, but I have never tried it. What's people's experience with it?
Being able to correct my words is a good thing. Hell, I did it ~3 times when writing this comment. I can't do that when I'm rambling. I'll trip, or CC will think I'm finished.
> I would end up copy-pasting interesting words into the dictionary window to pull up the word entry. SLOW!
> I would then click on the component characters to open their nested dictionary entries. SLOW!
> If I needed to remember the stroke order, I would scroll down for the static display. SLOW!
So, all of these are included in Anki-xiehanzi(https://github.com/krmanik/Anki-xiehanzi). Free open source software like Anki & xiehanzi can save you from using all those tokens.
I maintain an Anki deck for my chinese learning. Following the HSK books, I add new words to my deck with the character on front side and pinyin + definition + audio (from the CD and sliced using Audacity) on back side.
Interesting process. I wonder if he considered doing this with Anki. That would have given him a good SRS algo for free and Anki cards are also HTML+CSS+JS. I probably wouldn't try to put LLM calls onto my cards though
WIP (need more work in multi-hanzi words), but won't stay in the same 5 words for more than a day. it has been working well for me
the most interesting thing was GPT helped with the sentences and simplified words meaning and bing translate provided the audios
the goal is get the ~2000 words you need to be proficient in 1 year, 5 words a day plus refresh old words, also it keep track of your progress against the year, no streaks
>Empty landing page with a "Sign in with Google" button
>Can't find anyone else talking about it online, no screenshots of the gameplay, nothing
That's gonna be a no from me, dawg. It sounds like a cool idea, but sites have to get better about asking you to hand over your credentials without even telling you what exactly your getting for them.
> I decided to go against the grain of the near-universal advice to "learn to read by reading".
...Why? That advice is universal for a reason. The side adventure with Claude Code strikes me as a distraction from the fact that there is a hard thing you want to do but are avoiding because it's hard.
This is a hilariously common thing with studiers of Asian languages. There are countless posts with people spending years, even more than a decade, just trying to memorize every single kanji and how to write it before even beginning vocabulary or basic grammar, then lamenting how difficult the language is and how they can't pass kindergarten level tests. So then they spend loads of money on apps, make custom tools, and find countless other ways to burn time.
Meanwhile others read books and get pretty good at their language of choice in a couple years.
I agree completely. I will focus of Japanese because it’s the language I have experience with.
With a good order (RTK), optimal reviews (SRS) and putting 30 minutes a day it is possible to learn keyword to Kanji writing in a couple of months to one year. Make it two if you are a busy person. After that you need 10 minutes to maintain the knowledge per day. (I’m assuming 2200 Kanji).
People that did that successfully will recommend it to be done as early as possible as they know the boost in learning it provides.
I think it’s a trap, because it’s possible to get to a very useful level in the language while ignoring Kanji, and most people will be perfectly happy staying there. At that point you will have a much better idea if you really need to go all the way.
I'm at HSK3 level and struggle to find things to read outside of my actual textbooks with precisely-calibrated texts. If I can't read am average billboard, what should I read to improve?
I'm at a similar level, maybe a little behind that. I don't have any advice for you, but I'll relate the path I am planning to take. Would be happy to hear others' thoughts, too.
My feeling is this level is just too early to read "real" texts, so I am continuing to just use graded readers. I use the Du Chinese app for this, it contains a bunch of short stories at different comprehension levels, and has a spoken accompaniment to each story read by a real speaker (not AI/TTS). I also have some physical books from LingLing Mandarin, I like the challenge of not having a dictionary immediately to hand like I do in the app. My hope is by the time I finish with the Advanced stages of each of these sets of readers, I will be able to start reading "real" texts and fill in gaps with a dictionary app, at which point there is an infinite supply of material.
I do worry I'll end up at the "10% missing comprehension" described in the article, though, at which point I guess I'll try to find even higher level graded readers, if they exist. We'll see.
3. Count how often each word appears and sort sentences by descending frequency of the least common word.
4. Use binary search to find a location in the sorted collection of sentences where the difficulty feels about right.
Of course this gives you a collection of disjointed sentences, but you can always go to the original file and look at the surrounding context when you find an interesting or confusing one.
Great minds think alike :D https://hanzirama.com/character/%E5%AD%A6
Mind you, all data visible is collected from different reputable available sources. When you click "explain" there's LLM explanation, but my explanation generation pipeline, pushed all generated explanations through 5 different models including top Chinese for verification, and on average it took a few runs to iron out any information that could potentially mislead the learner.
You can actually see thousands of words I typed just working on that pipeline here https://hanzirama.com/making-of
Note on why this person is taking an unusual route: from https://blog.kevinzwu.com/cyborg-learning/ , they are a "second generation Chinese immigrant" and "heritage speaker"; that is, they live outside China, can speak the language because they learned it from their parents, but cannot read it.
Edit addendum: https://blog.kevinzwu.com/chinese-cursed-logographic-dags/ is a fun read. I've been using the imaginatively titled "kanji study" app, which uses the same Outlier database mentioned which has the graph based etymology.
There's an additional level of chaos when learning the "same" characters as kanji rather than hanzi.
I liked the 10% @@@ example, demonstrated their point pretty well.
Also for anyone who speaks or is currently learning Chinese... I've been working on a multiplayer CJK word game that shares a similar efficient brute force style of learning to the author's approach (although presented via gameplay instead of tooling). Every turn you get a random character and must type in a word that contains the char in ANY position. If you like fast paced word games it might be up your alley: https://danobang.com/?game_lang=cmn
> I opened Claude Code and started rambling into my mic. It wrote thousands of lines of questionably efficient JavaScript. I didn't read a single one.
Hm. I always knew voice mode was a thing, but I have never tried it. What's people's experience with it?
Being able to correct my words is a good thing. Hell, I did it ~3 times when writing this comment. I can't do that when I'm rambling. I'll trip, or CC will think I'm finished.
> I would end up copy-pasting interesting words into the dictionary window to pull up the word entry. SLOW!
> I would then click on the component characters to open their nested dictionary entries. SLOW!
> If I needed to remember the stroke order, I would scroll down for the static display. SLOW!
So, all of these are included in Anki-xiehanzi(https://github.com/krmanik/Anki-xiehanzi). Free open source software like Anki & xiehanzi can save you from using all those tokens.
> A guy on a forum had hired a calligrapher to write three thousand characters in ballpoint pen
A shame that this amazing resource is not linked.
here you go: https://www.chinese-forums.com/forums/topic/61471-mega-manda...
i’ve used this for a brief while, but dropped practicing handwriting completely shortly after.
I maintain an Anki deck for my chinese learning. Following the HSK books, I add new words to my deck with the character on front side and pinyin + definition + audio (from the CD and sliced using Audacity) on back side.
Interesting process. I wonder if he considered doing this with Anki. That would have given him a good SRS algo for free and Anki cards are also HTML+CSS+JS. I probably wouldn't try to put LLM calls onto my cards though
I'm trying something like Duolingo mixed with Dark souls
https://dondeng.com
WIP (need more work in multi-hanzi words), but won't stay in the same 5 words for more than a day. it has been working well for me
the most interesting thing was GPT helped with the sentences and simplified words meaning and bing translate provided the audios
the goal is get the ~2000 words you need to be proficient in 1 year, 5 words a day plus refresh old words, also it keep track of your progress against the year, no streaks
> I decided to go against the grain of the near-universal advice to "learn to read by reading".
...Why? That advice is universal for a reason. The side adventure with Claude Code strikes me as a distraction from the fact that there is a hard thing you want to do but are avoiding because it's hard.
This is a hilariously common thing with studiers of Asian languages. There are countless posts with people spending years, even more than a decade, just trying to memorize every single kanji and how to write it before even beginning vocabulary or basic grammar, then lamenting how difficult the language is and how they can't pass kindergarten level tests. So then they spend loads of money on apps, make custom tools, and find countless other ways to burn time.
Meanwhile others read books and get pretty good at their language of choice in a couple years.
I agree completely. I will focus of Japanese because it’s the language I have experience with.
With a good order (RTK), optimal reviews (SRS) and putting 30 minutes a day it is possible to learn keyword to Kanji writing in a couple of months to one year. Make it two if you are a busy person. After that you need 10 minutes to maintain the knowledge per day. (I’m assuming 2200 Kanji).
People that did that successfully will recommend it to be done as early as possible as they know the boost in learning it provides.
I think it’s a trap, because it’s possible to get to a very useful level in the language while ignoring Kanji, and most people will be perfectly happy staying there. At that point you will have a much better idea if you really need to go all the way.
https://news.ycombinator.com/item?id=47804903 - they can speak it, but not read it.
I'm at HSK3 level and struggle to find things to read outside of my actual textbooks with precisely-calibrated texts. If I can't read am average billboard, what should I read to improve?
I'm at a similar level, maybe a little behind that. I don't have any advice for you, but I'll relate the path I am planning to take. Would be happy to hear others' thoughts, too.
My feeling is this level is just too early to read "real" texts, so I am continuing to just use graded readers. I use the Du Chinese app for this, it contains a bunch of short stories at different comprehension levels, and has a spoken accompaniment to each story read by a real speaker (not AI/TTS). I also have some physical books from LingLing Mandarin, I like the challenge of not having a dictionary immediately to hand like I do in the app. My hope is by the time I finish with the Advanced stages of each of these sets of readers, I will be able to start reading "real" texts and fill in gaps with a dictionary app, at which point there is an infinite supply of material.
I do worry I'll end up at the "10% missing comprehension" described in the article, though, at which point I guess I'll try to find even higher level graded readers, if they exist. We'll see.
1. Take a large collection of text, e.g. from https://opus.nlpl.eu/corpora-search/zh-CN&en
2. Split into sentences and tokenize sentences into words, e.g. using https://github.com/fxsjy/jieba
3. Count how often each word appears and sort sentences by descending frequency of the least common word.
4. Use binary search to find a location in the sorted collection of sentences where the difficulty feels about right.
Of course this gives you a collection of disjointed sentences, but you can always go to the original file and look at the surrounding context when you find an interesting or confusing one.
Did you read part 3? Doesn’t sound like “avoiding hard things” is really a problem for the author :)
https://blog.kevinzwu.com/symbolhead-syndrome/