There's one thing that gave me pause:
In the phrase 我想学中文 it identified "wén" as "guó". While my pronunciation isn't perfect, there's no way that what I said is closer to "guó" than to "wén".
This indicates to me that the model learned word structures instead of tones here. "Zhōng guó" probably appears in the training data a lot, so the model has a bias towards recognizing that.
- Edit -
From the blog post:
> If my tone is wrong, I don’t want the model to guess what I meant. I want it to tell me what I actually said.
Your architecture also doesn't tell you what you actually said. It just maps what you said to the likeliest of the 1254 syllables that you allow. For example, it couldn't tell you that you said "wi" or "wr" instead of "wo", because those syllables don't exist in your setup.
I tried just repeating guó for as many times as symbols and repetition was not recognized.
Although I like the active aspect of the approach. Language apps where sound is the main form of learning should have a great advantage, as any written text just confuses as every country has its own spin on orthography. Even pinyin, despite making sense, for a beginner, has so many conflicting symbols.
If you find the base game too easy, I can recommend the IronMON challenge: You can only use one mon, permadeath, stats are randomized, all trainer levels are buffed by 1.5x and you can't level up on wilds. Along with numerous other rules to make it harder. There are variants that are borderline impossible to beat, like Super Kaizo IronMON. Out of hundreds of thousands of attempts, it has only been beaten once. Would make for an interesting optimization problem.
But honestly I really like the short turnaround times. Makes it easy to experiment with different parameters and develop an intuition for what they do.
Yes, but the odds of getting GPT-OSS to respond with that riddle are pretty low and it is not necessary to demonstrate whether the LLM can answer the riddle correctly.
I absolutely agree, but it's really stubborn with the flowery language. I tried adding things like "DO NOT USE EMPTY PHRASES LIKE 'EVER-EVOLVING TECH LANDSCAPE'!!!!!" to the prompt, but it just can't resist.
I want to give the whole system an overhaul, maybe newer models are better at this. Or maybe a second LLM pass to de-flowerize (lol) the language.
reply