I can not say how big ML companies do it, but from personal experience of training vision models, you can absolutely reuse the weights of barely related architectures (add more layers, switch between different normalization layers, switch between separable/full convolution, change activation functions, etc.). Even if the shapes of the weights do not match, just do what you have to do to make them fit (repeat or crop). Of course the models will not work right away, but training will go much faster. I usually get over 10 times faster convergence that way.
It’s possible the model architecture influences the effectiveness of utilizing pretrained weights. i.e. cnns might be a good fit for this since the first portion is the feature extractor, but you might scrap the decoder and simply retrain that.
Can’t say whether the same would work with Transformer architecture, but I would guess there are some portions that could potentially be reused? (there still exists an encoder/feature extraction portion)
If you’re reusing weights from an existing model, then it seems it becomes more of a “fine-tuning” exercise as opposed to training a novel foundational model.
Why would the open weights providers need their own tools for agentic workflows when you can just plug their OpenAI-compatible API URL into existing tools?
> when you can just plug their OpenAI-compatible API URL into existing tools?
Only the self-hosting diehards will bother with that. Those that want to compete with Claude Code, Gemini CLI, Codex et caterva will have to provide the whole package and do it a price point that is competitive even with low volumes - which is hard to do because the big LLM providers are all subsidizing their offerings.
You need a certain level of batch parallelism to make inference efficient, but you also need enough capacity to handle request floods. Being a small provider is not easy.
I just tried it with GPT-5.1-Codex. The compression ratio is not amazing, so not sure if it really worked, but at least it ran without errors.
A few ideas how to make it work for you:
1. You gave a link to a PDF, but you did not describe how you provided the content of the PDF to the model. It might only have read the text with something like pdftotext, which for this PDF results in a garbled mess. It is safer to convert the pages to PNG (e.g. with pdftoppm) and let the model read it from the pages. A prompt like "Transcribe these pages as markdown." should be sufficient. If you can not see what the model did, there is a chance it made things up.
2. You used C++, but Python is much easier to write. You can tell the model to translate the code to C++ once it works in Python.
3. Tell the model to write unit tests to verify that the individual components work as intended.
4. Use Agent Mode and tell the model to print something and to judge whether the output is sensible, so it can debug the code.
> I do wonder if there are any DOS vectors that need to be considered if such a large image can be defined in relatively small byte space.
You can already DOS with SVG images. Usually, the browser tab crashes before worse things happen. Most sites therefore do not allow SVG uploads, except GitHub for some reason.
svg is also just kind of annoying to deal with, because the image may or may not even have a size, and if it does, it can be specified in a bunch of different units, so it's a lot harder to get this if you want to store the size of the image or use it anywhere in your code
Could you explain a bit how the code works? For example, how does it detect the correct pixel size and how does it find out how to color the (potentially misaligned) pixels?
I think reddit's moderation guideline [that <10% of a users' posts ought to be related to product], along with time-limitations [see Y Combinator's own policy on its own incubated projects posting].
With exceptions for truly exceptional users (community concensus) // none granted, here.
----
New accounts ought to be able to downvote (currently 501+ karma) before they can ever submit new links (somehow no current restriction), IMHO.
----
OP: you are obviously new here (possibly AI translations, minimum, if not clanking-outright)... if your account isn't banned (which it should be IMHO, for at least a few months): don't post again until within the next monthly "What are you working on" thread, which is auto-generated (not by you).
This will require that you actually visit the homepage regularly, to wait for this thread... which might give you an opportunity to learn more about this community's culture / structure / rules.
At a minimum, give the bare minimum effort of abiding by this community's absolutely bare bones rules (publicly available).
Thanks for the feedback! I’m pretty new to posting on HN, so the writing style might be a bit rough — still figuring out the “right amount of em-dashes”
As for the video, it plays fine on my side, but it might be restricted by Vimeo’s region or Cloudflare settings on your network. I’ll double-check the permissions to make sure everyone can view it. Thanks for the heads up!
reply