The Monotype pricing change is brutal, but there’s a workaround. Derive new Japanese font families directly from public-domain sources.
I’ve been working on doing exactly that. Reconstructing clean vector glyphs from old metal-type Japanese books. The quality of those prints is surprisingly high, and they include thousands of kanji in consistent style. With some new technological innovations and a reasonable amount of hard work, you can produce a completely new, fully legal font family without touching any commercial IP.
The method I've devised is proprietary, but I’ll say this: it’s absolutely possible, and the output rivals modern JP fonts.
Given the sudden jump from ~$300/year to ~$20k/year for some devs, I expect more people to go down the “rebuild from PD artifacts” route instead of staying locked to a monopoly.
Yes. I did see that other article. No the process we are using is not using AI. We are not using OCR either. We are using computational geometry and forensics methodology. No flatbed scanners. No sheet fed scanners.
This isn't like anything ever done before. It's entirely different and higher quality than any result you can get through AI or OCR.
I do agree that detailed work is required to do it correctly and produced high quality results. I'm not offhandedly saying "just do these simple things and bam perfection."
Wow that sounds incredible. I'm super into fonts, I understand the proprietary nature but if OCR isn't used and neither is flatbed scanning, does that mean the 3D model is obtained? I can't think of another method.
It's very cool, would love to see some fonts you have available whenever it's out!
The initial input is high resolution images using a DSLR and a macro lens, or at least it will be soon. Initial testing of the method has been done using 200mp images taken casually with a standard modern cell phone.
The underlying new computational geometry method can be extended to 3d but that isn't necessary for this application unless we also extract a 3d image of the page itself. For now at least we are not doing that as it would be even more complicated and finicky. Possibly for soft enough pages the letterpress imprint will deform the page enough that the deformation can be detected and help figure out where the original metal pressed and where the ink is due to page bleed.
Essentially what we are doing is taking high resolution photos, using computational geometry methods on those to extract the shapes, and then refining those shapes through a mixture of automation and manual labor.
The entire thing is called "Donkey Free" and will have information online in the near future. I just bought the domain ( donkeyfree.com ) for this 2 days ago; this is all extremely new. I'd like to release the resulting fonts under a license allowing free use for many purposes but we still need to think through that to figure out how to make that sustainable.
It's fascinating how different this challenge must be between Latin vs CJK.
How do you match up the scans with unicode entities? Human supervision and/or OCR? To what extent is the breadth and quality of OCR the limiting factor?
Great questions — and you’re absolutely right that Latin vs. CJK is effectively two different universes in terms of reconstruction.
1. Latin vs. CJK differences
Latin glyphs are structurally simple: limited stroke vocabulary, mostly predictable modulation, and relatively low topological variation. Once you can recover outlines and stroke junctions accurately, mapping to Unicode is almost trivial.
That can be done with standard OCR methods for Latin.
CJK is the opposite. Each character is effectively a miniature blueprint with dozens of micro-decisions: stroke order, brush pressure artifacts, serif style, shape proportion, and even regional typographic conventions. Treating it like Latin “but bigger” doesn’t work. So the workflow for CJK has extra normalization steps and more constraints, especially when reconstructing consistent glyph families rather than one-offs.
From a simple perspective, CJK has many characters with disconnected pieces that are still part of the same character.
2. How we match scans to Unicode entities
We don’t rely on conventional OCR at all. OCR engines are optimized for reading text, not recovering the underlying design intent. Our process is closer to forensic glyph analysis — reconstructing stable structural signatures, then mapping those signatures to references.
This ends up being a hybrid:
• deterministic structural matching
• limited supervised correction when ambiguity exists
• and zero reliance on any off-the-shelf OCR models
It’s not “OCR first, match later.” It’s “reconstruct the letterpress structure, then Unicode becomes a lookup.” OCR quality literally doesn’t limit us because OCR isn’t part of the critical path.
3. What determines coverage
Coverage is defined by what we can physically access and reconstruct cleanly. For Latin, coverage is straightforward. For CJK, coverage is shaped by:
• typeface completeness in the source material
• the consistency of impression depth
• survivability of fine strokes in early printings
• and the practical question of how many thousand characters the original font designer actually cut
There’s no need for the entire Unicode set per book. The historical font only ever covered a finite subset. It is unfortunate that every book doesn't use every glyph, but not catastrophic because we can source many public domain books from the same era and eventually find enough characters matching the style.
In short:
Latin is an engineering challenge.
CJK is an archaeological one.
OCR is not a bottleneck because we don’t use it.
Coverage follows the historical material, not Unicode completeness.
I’ve been working on doing exactly that. Reconstructing clean vector glyphs from old metal-type Japanese books. The quality of those prints is surprisingly high, and they include thousands of kanji in consistent style. With some new technological innovations and a reasonable amount of hard work, you can produce a completely new, fully legal font family without touching any commercial IP.
The method I've devised is proprietary, but I’ll say this: it’s absolutely possible, and the output rivals modern JP fonts.
Given the sudden jump from ~$300/year to ~$20k/year for some devs, I expect more people to go down the “rebuild from PD artifacts” route instead of staying locked to a monopoly.