More

farhanhubble · 2025-12-12T07:49:12 1765525752

Is this Google's answer to OpenAI's browser launch? The first two paragraphs fail to describe what Disco is.

farhanhubble · 2025-12-11T16:50:24 1765471824

I have always used SwiftKey and Android. This year I switched to Apple because Android was being bloated by Samsung etc. I'm shocked by how horrible Apple keypad is. I also feel like the touch sensitivity of iphone is worse than Samsung phones.

I installed SwiftKey on iPhone too but even it seems sluggish.

nashashmi · 2025-12-12T02:45:58 1765507558

Long a swift key fan but ever since it got sold to MS, it has gone downhill. I have it on my iphone and I think the development on that has stopped. My favorite keyboard is just unusable.

I went to apple keyboard and had to disable autocorrect because it would uncorrect it to the wrong word until five words down and decides which word makes more sense.

bpye · 2025-12-11T21:40:11 1765489211

I went the other way this year, from an iPhone to a Z Flip 7. It's generally been a pretty good experience - the bloat on Samsung devices seems significantly less bad than it used to be 7 or 8 years ago.

I've stuck with Samsung's keyboard and it has mostly been fine, though it's less aggressive about adding punctuation for contractions etc.

fragmede · 2025-12-11T17:22:15 1765473735

GBoard for me. Can't stand the Apple iOS keyboard for some reason.

neom · 2025-12-11T17:27:15 1765474035

+1 on GBoard - every time an app has that weird bug where it selects the native ios keyboard instead of GBoard it doesn't take long for me to notice, it's crazy how bad the Apple iOS keyboard is by comparison.

farhanhubble · 2025-12-10T04:02:28 1765339348

Yeah but it also made me think if deep down neural networks are curated random basis vectors, like in random projections.

farhanhubble · 2025-12-10T03:11:53 1765336313

Only skimmed the paper and I have no idea how sound or reproducible it is, but the paper is well written, especially the clarity of notation. After reading yesterday's weight subspace paper: https://news.ycombinator.com/item?id=46199623, this does sound plausible to me.

farhanhubble · 2025-12-10T02:19:44 1765333184

> all the way to the borderlands of active anxiety—not quite understanding what Claude just wrote.

This is a big issue, personally. I write Python and bash these days and I love that we're not bottlenecked by IDE-based autocomplete anymore, especially for dynamic languages and a huge amount of fixing and incremental feature work can be done in minutes instead of days thanks to AI being able to spot patterns. Simultaneously I'm frustrated when these agents fail to deliver small changes and I have to jump in and change something I don't have a good mental model of or, worse still, something that's all Greek to me, like Javascript.

farhanhubble · 2025-12-09T12:17:06 1765282626

People are sheep. Someone somewhere used mathematical puzzles as interview questions. That someone became big. Others assumed it was because their interview process was amazing and followed blindly. Soon enough the process started to be gamed.

I'm seeing this trend again in the field of AI where math olympiad participants are being given God like status by a few companies and the media.

Truth is even the most prolific computational scientists will flunk these idiotic interviews.

MyHonestOpinon · 2025-12-09T17:17:53 1765300673

I am guilty of this. I started asking simple programming questions back in the early 90s. It was just a way to see if interviewee knew how to use for loops and conditionals, to see if they can solve simple problems. It was great when taken unprepared, but once people started drilling and memorizing them, the problems became a lot harder. It got to the point where you really have to study, it is not enough to have 20 years of professional programming experience.

Fun story. For years, I used a set of problems that I took from a very old programming book. I have probably seen dozens of solutions for these problems. About 6 years, in an interview, somebody happen to ask me about one of these problems. So, I wrote the solution and the interviewer told me it was wrong, but he couldn't tell me why it was wrong. Then he proceded to clean the screen. (It was remote interview). So I flunk the interview with a problem that I knew back and forth.

netdevphoenix · 2025-12-09T12:37:01 1765283821

Hundred percent. Classic example of academic smarts vs real world smarts.

It's why developers as a group will lose negotiating power over time. You would expect a smart person to question why that 'problem' exists in the first place rather than forge ahead and making a solution for a problem that doesn't exist. It's like your manager telling you to write a software that does something, whatever that is. Your first question should be why and you should not type a single letter until you understand the domain and whether a software solution is needed in the first place.

For all the intellectuality modern devs give to themselves, they are still asking how high when told to jump. And in some cases even bragging about jump heights. Only difference is that many devs look down upon others (or simply are unable to understand those) who refuse to jump.

We all know devs have better things to focus on, given the state of modern software development.

ascorbic · 2025-12-09T13:56:31 1765288591

Yes, and it's mostly the fault of a handful of companies like Google and Facebook that were started by founders who were still in college, so choose interview problems that look like CS algo puzzles instead of anything related to real work.

farhanhubble · 2025-12-09T03:24:54 1765250694

It might we worth it to use that subset to initialize the weights of future models but more importantly you could save a huge number of computational cycles by using the lower dimensional weights at the time of inference.

westoncb · 2025-12-09T05:22:36 1765257756

Ah interesting, I missed that possibility. Digging a little more though my understanding is that what's universal is a shared basis in weight space, and particular models of same architecture can express their specific weights via coefficients in a lower-dimensional subspace using that universal basis (so we get weight compression, simplified param search). But it also sounds like to what extent there will be gains during inference is in the air?

Key point being: the parameters might be picked off a lower dimensional manifold (in weight space), but this doesn't imply that lower-rank activation space operators will be found. So translation to inference-time isn't clear.

farhanhubble · 2025-12-09T08:29:39 1765268979

My understanding differs and I might be wrong. Here's what I inferred:

Let's say you finetune a Mistral-7B. Now, there are hundreds of other fine-tuned Mistral-7B's, which means it's easy to find the universal subspace U of the weights of all these models combined. You can then decompose the weights of your specific model using U and a coefficient matrix C specific to your model. Then you can convert any operation of the type `out=Wh` to `out=U(C*x)` Both U and C are much smaller dimension that W and so the number of matrix operations as well as the memory required is drastically lower.

farhanhubble · 2025-12-09T01:46:16 1765244776

Would you see a lower rank subspace if the learned weights were just random vectors?

imtringued · 2025-12-10T13:09:28 1765372168

This is a good point, but I think this only works for D*A, where D=Sigma is a diagonal matrix with learnable parameters. It probably doesn't work for a full singular value decomposition (SVD) UDV^T.

Basically, what if we're not actually "training" the model, but rather the model was randomly initialized and the learning algorithm is just selecting the vectors that happen to point into the right direction? A left multiplication of the form D*A with a diagonal matrix is equivalent to multiplying each row in A with the corresponding diagonal element. Low values mean the vector in question was a lottery blank and unnecessary. High values means that this turns out to be correct vector, yay!

But this trivial explanation doesn't work for the full SVD, because you now have a right multiplication U*D. This means each column gets multiplied against the corresponding diagonal element. Both the column in U and row vector in V^T have to perfectly coincide to make the "selection" theory work, which is unlikely to be true for small models, which happen to work just fine.*

farhanhubble · 2025-12-02T15:44:56 1764690296

I loved his Statistics for Hackers talk: https://speakerdeck.com/pycon2016/jake-vanderplas-statistics...

yboris · 2025-12-02T19:20:59 1764703259

Amazing Thank you for sharing.

Reminds me of how thinking using frequencies rather than computing probabilities is easier and can avoid errors (e.g. a 99% accurate test being positive does not mean 99% likelihood of having disease for a disease with a 1/10,000 prevalence in population).

farhanhubble · 2025-11-06T04:33:03 1762403583

Because they always have an intellectual itch. They want to go down every rabbit hole, They think things can be done better. And, in general, because they think. Because they have little interest in mundane and repetitive things.