One thing that's still compelling about all-Mini is that it's feasible to use it client-side. IIRC it's a 70MB download, versus 300MB for EmbeddingGemma (or perhaps it was 700MB?)
Are there any solid models that can be downloaded client-side in less than 100MB?
It's a strong, active community. Much more focused on computing. I'm happy to invite anyone who wants to join. You can find a way to contact me on https://technicalwriting.dev. Please also link me to your website, LinkedIn, etc.
> provided confidential mortgage pricing data from Fannie Mae to a principal competitor
It seems like the Fannie Mae data was shared with Freddie Mac. Aren't they both quasi-government organizations? GSEs. So they're both supported by the government but there's a firewall between them to keep some semblance of competition?
Having worked on this data since investors buy the loans, the loan level data by definition needs to be public. Even the borrower information is not secret because real estate ownership is public in USA. So I don’t understand what information it could possibly be other than fraud data. I think sharing fraud data is not colluding.
> The widely known example only works because the implementation of the algorithm will exclude the original vector from the possible results!
I saw this issue in the "same topic, different domain" experiment when using EmbeddingGemma with the default task types. But when using custom task types, the vector arithmetic worked as expected. I didn't have to remove the original vector from the results or control for that in any way. So while the criticism is valid for word2vec I'm skeptical that modern embedding models still have this issue.
Very curious to learn whether modern models are still better at some analogies (e.g. male/female) and worse at others, though. Is there any more recent research on that topic? The linked article is from 2019.
> If you know for every feature you release, you need an API doc, an FAQ, usage samples for different workflows or verticals you're targetting, you can represent each of these as f(doc) + f(topic) and find the existing doc set. But then, you can have much more deterministic workflows from just applying structure.
This one sounds promising to me, thanks for the suggestion. We technical writers often build out "docs completeness" spreadsheets where we track how completely each product feature is covered, exactly as you described. E.g. the rows are features, column B is "Reference", column C is "Tutorial" etc. So cell B1 would contain the deeplink to the reference for some particular feature. When we inherit a huge, messy docs set (which is fairly common) it can take a very long time to build out a docs completeness dashboard. I think the embeddings workflow you're suggesting could speed up the initial population of these dashboards a lot.
You can probably do this in a day with a CLI based LLM like Claude Code. It can write the tools that would allow you to sort, test and cross check your doc sets.
Are there any solid models that can be downloaded client-side in less than 100MB?
reply