Have you done any work on trying to make the opposite? Injecting English words into Japanese text to make it easier to read?
I find that students of Japanese often have enough grammar to read widely after finishing a couple of beginner textbooks, but they are completely held back by vocabulary.
I have a deep understanding of this point, a lack of vocabulary makes reading Japanese materials very difficult.
For this scenario, we will translate the Japanese text completely into English first, then inject japanese words in to the english text, the translated text with the injected Japanese words is displayed next to the original material.
This is the main feature I've been using myself, you can try it out and see if it's the feature you want.
I can second this, after finishing my intro Japanese classes I was able to parse the grammar of most sentences. Memorizing vocab was the hard part, so I used OCR on manga pages and then Yomitan to hover over and see word definitions (in English).
I hear this often but haven’t seen too many translation-free alternatives for the non-immersion tasks (eg: memorizing vocabulary for a standardized exam, daily study in a non-immersive environment). Have you seen any good monolingual techniques beyond “just get tons of exposure”?
Nice job! There have been quite a few of these language substitution extensions over the years. (Language Immersion, Polyglot, MindTheWord, etc.)
I have a personal extension that I wrote (close to 12 years ago at this point) which does the same thing - translates random words on websites as you browse according to your linguistic level. It vastly predates LLMs though so it's all built on sentence segmentation, POS analysis for stemming, and other NLP techniques.
I've written a bunch of integrations for it so it works with websites, documents, even Kindle books.
The site is visually a bit of a mess. The nav bar anchors but not to the top of the viewport (scroll and watch). Some of the cards are also different sizes. Some of the text isn't properly spaced (look for the colons).
LLM's makes this kind of words substitution more easier and accurate. we have also tried some methods like NLP, but effect is mediocre, but if we want use it in specific scenarios, NLP maybe more efficient.
The website's visual design definitely needs improvement. we are currently work on it.
The concept of "injecting" Japanese into text written in a different language is interesting. But I feel the presentation of word definitions are not great. Something similar to https://yomitan.wiki/ or https://jisho.org/search/kotoba would be preferred. E.g. 言葉ーことばーLanguage, word or phraseーKanji definitionsーSample sentence
we've considered using local llm, but the problem is that for a better user experience, we will add user's new vocabulary list, then inject words based on the list, it's hard to do this on local.
We will seriourly consider the point of support local llm, this will also allow more users to utilize our basic functions.
Thanks for feedback, in some cases, we use a NLP lib to detect to language of the word since we support multiple languages, this may be due to language detection failed on some words.
Why is it using romaji to show the pronunciation, instead of furigana? Any serious Japanese learner will learn hiragana and katakana very early on, and these are better for reading pronunciation than romaji.
Thanks for the feedback, actually we use furigana to show the pronunciation,
we use LLM to produce the word explanation, this may be due to LLM instability, could you help tell me the word of this case on your side.
This seems interesting. I would like an Ollama version and an ability to turn off the hovering as I already have Yomichan installed.
Have you done any work on trying to make the opposite? Injecting English words into Japanese text to make it easier to read?
I find that students of Japanese often have enough grammar to read widely after finishing a couple of beginner textbooks, but they are completely held back by vocabulary.
I have a deep understanding of this point, a lack of vocabulary makes reading Japanese materials very difficult.
For this scenario, we will translate the Japanese text completely into English first, then inject japanese words in to the english text, the translated text with the injected Japanese words is displayed next to the original material.
This is the main feature I've been using myself, you can try it out and see if it's the feature you want.
I can second this, after finishing my intro Japanese classes I was able to parse the grammar of most sentences. Memorizing vocab was the hard part, so I used OCR on manga pages and then Yomitan to hover over and see word definitions (in English).
> Just hover to get translation
Translating everything into your native language is pretty universally considered a very bad habit in language pedagogy.
I hear this often but haven’t seen too many translation-free alternatives for the non-immersion tasks (eg: memorizing vocabulary for a standardized exam, daily study in a non-immersive environment). Have you seen any good monolingual techniques beyond “just get tons of exposure”?
I’ve been experimenting with monolingual vocab this month but it is too soon to say if I like it or not: https://rickcarlino.com/notes/korean-language/monolingual-vo...
Nice job! There have been quite a few of these language substitution extensions over the years. (Language Immersion, Polyglot, MindTheWord, etc.)
I have a personal extension that I wrote (close to 12 years ago at this point) which does the same thing - translates random words on websites as you browse according to your linguistic level. It vastly predates LLMs though so it's all built on sentence segmentation, POS analysis for stemming, and other NLP techniques.
I've written a bunch of integrations for it so it works with websites, documents, even Kindle books.
https://mordenstar.com/projects/linguaswap
Now onto some feedback:
The site is visually a bit of a mess. The nav bar anchors but not to the top of the viewport (scroll and watch). Some of the cards are also different sizes. Some of the text isn't properly spaced (look for the colons).
Thanks for your feedback!
LLM's makes this kind of words substitution more easier and accurate. we have also tried some methods like NLP, but effect is mediocre, but if we want use it in specific scenarios, NLP maybe more efficient.
The website's visual design definitely needs improvement. we are currently work on it.
The concept of "injecting" Japanese into text written in a different language is interesting. But I feel the presentation of word definitions are not great. Something similar to https://yomitan.wiki/ or https://jisho.org/search/kotoba would be preferred. E.g. 言葉ーことばーLanguage, word or phraseーKanji definitionsーSample sentence
It's a cool idea, but the lack of a space between regular words and words wrapped in a <span> is driving my typo-radar nuts
Really appreciate your feedback! we may have overlooked this when handling multilingual support, and we will optimize it in the next version.
re: > since it uses paid AI APIs for the words replacement, I couldn't make it 100% free (server costs are real, unfortunately)
is there a possibility of using local llm endpoints for this?
we've considered using local llm, but the problem is that for a better user experience, we will add user's new vocabulary list, then inject words based on the list, it's hard to do this on local.
We will seriourly consider the point of support local llm, this will also allow more users to utilize our basic functions.
Nice project!
As a struggling lifelong English learner I had an exactly same idea, but for English.
Interesting. The voice used for the pronunciation sound seems to be using the wrong language though (FYI using Firefox).
Thanks for feedback, in some cases, we use a NLP lib to detect to language of the word since we support multiple languages, this may be due to language detection failed on some words.
Do you have a roadmap for adding support for more browsers eventually?
Thanks for your asking.
Yes, we will prioritize support for Safari, Opera, and Arc. Support for other browsers will be added as needed.
Why is it using romaji to show the pronunciation, instead of furigana? Any serious Japanese learner will learn hiragana and katakana very early on, and these are better for reading pronunciation than romaji.
Thanks for the feedback, actually we use furigana to show the pronunciation, we use LLM to produce the word explanation, this may be due to LLM instability, could you help tell me the word of this case on your side.