More

kaycebasques · 2025-11-28T20:31:37 1764361897

One thing that's still compelling about all-Mini is that it's feasible to use it client-side. IIRC it's a 70MB download, versus 300MB for EmbeddingGemma (or perhaps it was 700MB?)

Are there any solid models that can be downloaded client-side in less than 100MB?

intalentive · 2025-11-28T21:57:00 1764367020

This is the smallest model in the top 100 of HF's MTEB Leaderboard: https://huggingface.co/Mihaiii/Ivysaur

Never used it, can't vouch for it. But it's under 100 MB. The model it's based on, gte-tiny, is only 46 MB.

nijaru · 2025-11-28T22:06:28 1764367588

For something under 100 MB, this is probably the strongest option right now.

https://huggingface.co/MongoDB/mdbr-leaf-ir

kaycebasques · 2025-11-28T20:25:54 1764361554

The proposal to rename the vibecoding tag to something else provides a pretty good sense of the overall sentiment among the site's members: https://lobste.rs/s/gkzmfy/let_s_rename_vibecoding_tag_llms

kaycebasques · 2025-11-28T20:23:19 1764361399

It's a strong, active community. Much more focused on computing. I'm happy to invite anyone who wants to join. You can find a way to contact me on https://technicalwriting.dev. Please also link me to your website, LinkedIn, etc.

veqq · 2025-11-29T04:39:33 1764391173

Similarly, I give out invites on IRC "like candy"

kaycebasques · 2025-11-14T04:42:56 1763095376

> provided confidential mortgage pricing data from Fannie Mae to a principal competitor

It seems like the Fannie Mae data was shared with Freddie Mac. Aren't they both quasi-government organizations? GSEs. So they're both supported by the government but there's a firewall between them to keep some semblance of competition?

consumer451 · 2025-11-14T05:06:28 1763096788

If your assessment was correct, then the next question might be: why did these people quit their very cushy jobs?

kaycebasques · 2025-11-14T06:44:16 1763102656

The article says that they were forced out of their jobs. That could mean many things, but it has a different connotation than quitting

siilats · 2025-11-14T08:52:27 1763110347

Having worked on this data since investors buy the loans, the loan level data by definition needs to be public. Even the borrower information is not secret because real estate ownership is public in USA. So I don’t understand what information it could possibly be other than fraud data. I think sharing fraud data is not colluding.

kaycebasques · 2025-11-12T01:44:06 1762911846

Related article: https://www.quantamagazine.org/why-everything-in-the-univers...

kaycebasques · 2025-11-02T00:20:56 1762042856

From the linked article:

> The widely known example only works because the implementation of the algorithm will exclude the original vector from the possible results!

I saw this issue in the "same topic, different domain" experiment when using EmbeddingGemma with the default task types. But when using custom task types, the vector arithmetic worked as expected. I didn't have to remove the original vector from the results or control for that in any way. So while the criticism is valid for word2vec I'm skeptical that modern embedding models still have this issue.

Very curious to learn whether modern models are still better at some analogies (e.g. male/female) and worse at others, though. Is there any more recent research on that topic? The linked article is from 2019.

kaycebasques · 2025-11-01T23:19:18 1762039158

> If you know for every feature you release, you need an API doc, an FAQ, usage samples for different workflows or verticals you're targetting, you can represent each of these as f(doc) + f(topic) and find the existing doc set. But then, you can have much more deterministic workflows from just applying structure.

This one sounds promising to me, thanks for the suggestion. We technical writers often build out "docs completeness" spreadsheets where we track how completely each product feature is covered, exactly as you described. E.g. the rows are features, column B is "Reference", column C is "Tutorial" etc. So cell B1 would contain the deeplink to the reference for some particular feature. When we inherit a huge, messy docs set (which is fairly common) it can take a very long time to build out a docs completeness dashboard. I think the embeddings workflow you're suggesting could speed up the initial population of these dashboards a lot.

seg_lol · 2025-11-01T23:39:47 1762040387

You can probably do this in a day with a CLI based LLM like Claude Code. It can write the tools that would allow you to sort, test and cross check your doc sets.

kaycebasques · 2025-10-28T01:05:03 1761613503

Is the big US military build up on the coast of Venezuela being affected by this?

kaycebasques · 2025-10-23T00:13:53 1761178433

https://archive.fo/cfy1z

kaycebasques · 2025-10-23T00:11:44 1761178304

https://archive.fo/wSLMY