The Monotype pricing change is brutal, but there’s a workaround. Derive new Japa...

oliwarner · 2025-12-03T12:19:28 1764764368

Indeed. Scan a book in public domain, feed into an online font generation service, pay somebody to clean it up.

A few hours later, you have a font you can use how you like. Is it as good? Probably not, but it's much cheaper.

Edit: oh look https://news.ycombinator.com/item?id=46127400

dryark · 2025-12-03T12:59:38 1764766778

Yes. I did see that other article. No the process we are using is not using AI. We are not using OCR either. We are using computational geometry and forensics methodology. No flatbed scanners. No sheet fed scanners.

This isn't like anything ever done before. It's entirely different and higher quality than any result you can get through AI or OCR.

I do agree that detailed work is required to do it correctly and produced high quality results. I'm not offhandedly saying "just do these simple things and bam perfection."

flutteringfool · 2025-12-03T22:45:58 1764801958

Wow that sounds incredible. I'm super into fonts, I understand the proprietary nature but if OCR isn't used and neither is flatbed scanning, does that mean the 3D model is obtained? I can't think of another method.

It's very cool, would love to see some fonts you have available whenever it's out!

dryark · 2025-12-04T00:56:54 1764809814

The initial input is high resolution images using a DSLR and a macro lens, or at least it will be soon. Initial testing of the method has been done using 200mp images taken casually with a standard modern cell phone.

The underlying new computational geometry method can be extended to 3d but that isn't necessary for this application unless we also extract a 3d image of the page itself. For now at least we are not doing that as it would be even more complicated and finicky. Possibly for soft enough pages the letterpress imprint will deform the page enough that the deformation can be detected and help figure out where the original metal pressed and where the ink is due to page bleed.

Essentially what we are doing is taking high resolution photos, using computational geometry methods on those to extract the shapes, and then refining those shapes through a mixture of automation and manual labor.

The entire thing is called "Donkey Free" and will have information online in the near future. I just bought the domain ( donkeyfree.com ) for this 2 days ago; this is all extremely new. I'd like to release the resulting fonts under a license allowing free use for many purposes but we still need to think through that to figure out how to make that sustainable.

flutteringfool · 2025-12-04T07:05:17 1764831917

Sounds great, best of luck! Hope you make a Show HN post when it's ready

afandian · 2025-12-03T13:46:55 1764769615

It's fascinating how different this challenge must be between Latin vs CJK.

How do you match up the scans with unicode entities? Human supervision and/or OCR? To what extent is the breadth and quality of OCR the limiting factor?

How do you define your target entity coverage?

dryark · 2025-12-03T14:12:50 1764771170

Great questions — and you’re absolutely right that Latin vs. CJK is effectively two different universes in terms of reconstruction.

1. Latin vs. CJK differences Latin glyphs are structurally simple: limited stroke vocabulary, mostly predictable modulation, and relatively low topological variation. Once you can recover outlines and stroke junctions accurately, mapping to Unicode is almost trivial.

That can be done with standard OCR methods for Latin.

CJK is the opposite. Each character is effectively a miniature blueprint with dozens of micro-decisions: stroke order, brush pressure artifacts, serif style, shape proportion, and even regional typographic conventions. Treating it like Latin “but bigger” doesn’t work. So the workflow for CJK has extra normalization steps and more constraints, especially when reconstructing consistent glyph families rather than one-offs.

From a simple perspective, CJK has many characters with disconnected pieces that are still part of the same character.

2. How we match scans to Unicode entities We don’t rely on conventional OCR at all. OCR engines are optimized for reading text, not recovering the underlying design intent. Our process is closer to forensic glyph analysis — reconstructing stable structural signatures, then mapping those signatures to references.

This ends up being a hybrid: • deterministic structural matching • limited supervised correction when ambiguity exists • and zero reliance on any off-the-shelf OCR models

It’s not “OCR first, match later.” It’s “reconstruct the letterpress structure, then Unicode becomes a lookup.” OCR quality literally doesn’t limit us because OCR isn’t part of the critical path.

3. What determines coverage Coverage is defined by what we can physically access and reconstruct cleanly. For Latin, coverage is straightforward. For CJK, coverage is shaped by: • typeface completeness in the source material • the consistency of impression depth • survivability of fine strokes in early printings • and the practical question of how many thousand characters the original font designer actually cut

There’s no need for the entire Unicode set per book. The historical font only ever covered a finite subset. It is unfortunate that every book doesn't use every glyph, but not catastrophic because we can source many public domain books from the same era and eventually find enough characters matching the style.

In short: Latin is an engineering challenge. CJK is an archaeological one. OCR is not a bottleneck because we don’t use it. Coverage follows the historical material, not Unicode completeness.

bramstein · 2025-12-03T18:30:15 1764786615

Would love to hear or read more about this if it is public.