More

1024core · 2026-02-20T16:30:27 1771605027

What if you use a film camera?

1024core · 2026-02-19T17:53:33 1771523613

It got the car wash question perfectly:

You are definitely going to have to drive it there—unless you want to put it in neutral and push!

While 200 feet is a very short and easy walk, if you walk over there without your car, you won't have anything to wash once you arrive. The car needs to make the trip with you so it can get the soap and water.

Since it's basically right next door, it'll be the shortest drive of your life. Start it up, roll on over, and get it sparkling clean.

Would you like me to check the local weather forecast to make sure it's not going to rain right after you wash it?

1024core · 2026-02-19T18:21:28 1771525288

Some people are suggesting that this might actually be in the training set. Since I can't rule that out, I tried a different version of the question, with an elephant instead of a car:

> It's a hot and dusty day in Arizona and I need to wash my elephant. There's a creek 300 feet away. Should I ride my elephant there or should I just walk there by myself?

Gemini said:

That sounds like quite the dusty predicament! Given the options, you should absolutely ride (or lead) your elephant to the creek.

Here is the straightforward reality of the situation:

    The Logistical Nightmare of Walking Alone: If you walk to the creek by yourself, your elephant stays at your starting point. Elephants require a massive amount of water for a bath. Unless you are planning an extreme workout carrying hundreds of buckets of water 300 feet back through the Arizona heat, you're going to have a very bad time—and the elephant will stay dusty.

    The Perks of Going Together: 300 feet is a very short, easy stroll for an elephant. Plus, elephants generally love water and mud! Letting your elephant get right into the creek will make the washing process much easier for you and much more enjoyable for them. They can even use their trunk to help hose themselves down.

Grab some elephant-safe soap and head down to the water together!

Would you like some tips on the best methods and tools for scrubbing down an elephant?

kovek · 2026-02-19T18:31:18 1771525878

I think that semantically this question is too similar to the car wash one. Changing subjects from car to elephant and car wash to creek does not change the fact that they are subjects. The embeddings will be similar in that dimension.

1024core · 2026-02-19T18:35:01 1771526101

I understand. But isn't it a sign of "smarts" that one can generalize from analoguous tasks?

kovek · 2026-02-20T05:28:02 1771565282

Every word and every hierarchy of words in natural language is understand by LLMs as embeddings (vectors).

Each vector has many many dimensions, and when we train the LLMs, their internal understanding of those vectors sees all sorts of dimensions. A simple way to visualize this is a word's vector being <1, 180, 1, 3, ... > which would all mean a certain value at that dimension. In this example say the dimensions are <gender, height in cm, kindness, social title/job, ...> . In this case, our example LLM could have learned that the example I gave is <Woman, 180, 100% kind, politician, ... >. The vector's undergo some transformation so every dimension is not that discretely clear cut.

In this case, elephant and car both semantically look very similar to vehicles. They basically would have most vectors very similar.

See this article. It shows that once you train an LLM, and you assign an embedding vector for each token, then you can see how the LLM can distinguish the difference between king and queen: man and woman.

https://informatics.ed.ac.uk/news-events/news/news-archive/k...

willis936 · 2026-02-19T18:42:41 1771526561

Sure it is, but it's a different set of smarts than the kind of gotcha logic puzzle trying to be tested with the car wash question.

nebezb · 2026-02-19T22:34:14 1771540454

My gut says you’re right, but I don’t know if this is indeed true. It might be the same thing.

news_hacker · 2026-02-20T13:30:22 1771594222

LLMs are great at knowledge transfer, the real question is how well can they demonstrate intelligence with "unknown unknown" types of questions. This model has the benefit of being released after that issue became public knowledge, so it's hard to know how it would've performed pre-hoc.

proto-n · 2026-02-19T19:50:05 1771530605

From Gemini pro:

You should definitely ride the elephant (or at least lead it there)!

Here is the logic:

If you walk there by yourself, you will arrive at the creek, but the dirty elephant will still be 300 feet back where you started. You can't wash the elephant if it isn't with you!

Plus, it is much easier to take the elephant to the water than it is to carry enough buckets of water 300 feet back to the elephant.

Would you like another riddle, or perhaps some actual tips on how to keep cool in the Arizona heat?

irthomasthomas · 2026-02-19T21:49:38 1771537778

i would say this is a lower difficulty. the car question primes it to think about stuff like energy and pollution.

cyanydeez · 2026-02-19T22:51:27 1771541487

Ok, but the point of the logical question is about the connection. If you really think it's answering logically with reasoning, there should be zero priming.

irthomasthomas · 2026-02-20T10:10:17 1771582217

its not primed to help, its primed to confuse. models want to be good responsible people who care about the environment and don't waste fuel. that primes it to want to walk and it has to use "reasoning" to break out of that. thats what makes it harder, it has to fight between the logical answer and the 'responsible' answer. with the elephant question there is no such conflict.

jmkni · 2026-02-19T21:25:25 1771536325

I like telling it the car wash is 24,901.1 miles away (with the implication being that I'm right beside the car wash)

embedding-shape · 2026-02-19T18:16:32 1771524992

GPT-OSS-120b, a (downloadable) model released more than half a year ago also gets that right, I'm not sure this is such a great success.

> Would you like me to check the local weather forecast to make sure it's not going to rain right after you wash it?

Regardless of what I do, the days I decide to wash my car, it ALWAYS rains the day after, even if I look at the weather report and it says sunny. Cute that Gemini thinks it can predict it better than universe's will.

sushisource · 2026-02-19T21:10:54 1771535454

> Regardless of what I do, the days I decide to wash my car, it ALWAYS rains the day after

Undeniable universal truth. I sometimes find myself making plans based on the fact that the most annoying possible outcome is also the most likely one.

Aachen · 2026-02-19T21:12:38 1771535558

Non car person here. Why does that matter? It's not like rain means you didn't have to go to the wash, it rains often enough here that there wouldn't be car wash places left near me but there are plenty

embedding-shape · 2026-02-19T23:04:59 1771542299

> Why does that matter? It's not like rain means you didn't have to go to the wash

The car gets dirty again when it rains and when it gets dry again. I guess dust, salt, pollution and more is what gets mixed in and put on the chassi as it rains, falls from roofs and splashes, but can't say I've investigated deeply enough. Not the end of the world, just annoying it keeps happening.

opinion-is-bad · 2026-02-19T21:47:15 1771537635

Many people avoid washing cars just before rain to avoid spots, etc. Phoenix as an extreme example rarely rains and leaves everything filthy afterwards.

steve1977 · 2026-02-19T18:02:32 1771524152

The question of course is, did it get the car wash question right because it is "the car wash question" or because it could actually infer why the car needed to be there?

embedding-shape · 2026-02-19T18:15:49 1771524949

Wasn't that "twoot" (or whatever Mastodon calls them) made just a week ago? Unlikely to have been in the training dataset of a model becoming available for public use today, unless Google made some serious advancements on the training front.

jama211 · 2026-02-19T18:17:21 1771525041

Shouldn’t be too hard to come up with a new unique reasoning question

upmind · 2026-02-19T18:12:39 1771524759

The answer here is why I dislike Gemini, though it gets the correct answer, it's far too verbose.

KoolKat23 · 2026-02-19T21:41:36 1771537296

I can't stand a model over-explaining, needless fluff and wasting tokens. I asked the question so I know the context.

mikelward · 2026-02-19T19:59:37 1771531177

I don't love the verbosity of any of the chatbots when I'm using my phone, but at least it put the answer/tl;dr in the first paragraph.

leumon · 2026-02-19T20:23:10 1771532590

Gemini 3 pro and flash already answered this correctly.

claiir · 2026-02-20T06:34:14 1771569254

And Gemini 3 can’t..? Isn’t this just a thinking vs nonthinking model thing?

patrickmcnamara · 2026-02-19T18:54:34 1771527274

Truly we entering the era of AGI.

suddenlybananas · 2026-02-19T18:01:07 1771524067

They probably had time to toss that example in the training soup.

AlphaAndOmega0 · 2026-02-19T18:07:25 1771524445

Previous models from competitors usually got that correct, and the reasoning versions almost always did.

This kind of reflexive criticism isn't helpful, it's closer to a fully generalized counter-argument against LLM progress, whereas it's obvious to anyone that models today can do things they couldn't do six months ago, let alone 2 years back.

suddenlybananas · 2026-02-19T18:11:05 1771524665

I'm not denying any progress, I'm saying that reasoning failures that are simple which have gone viral are exactly the kind of thing that they will toss in the training data. Why wouldn't they? There's real reputational risks in not fixing it and no costs in fixing it.

AlphaAndOmega0 · 2026-02-19T21:37:18 1771537038

Given that Gemini 3 Pro already did solid on that test, what exactly did they improve? Why would they bother?

I double checked and tested on AI Studio, since you can still access the previous model there:

>You should drive. >If you walk there, your car will stay behind, and you won't be able to wash it.

Thinking models consistently get it correct and did when the test was brand new (like a week or two ago). It is the opposite of surprising that a new thinking model continues getting it correct, unless the competitors had a time machine.

suddenlybananas · 2026-02-20T08:06:44 1771574804

Why would they bother? Because it costs essentially nothing to add it to the training data. My point is that once a reasoning example becomes sufficiently viral, it ceases to be a good test because companies have a massive incentive to correct it. The fact some models got it right before (unreliably) doesn't mean they wouldn't want to ensure that the model gets it right.

buttered_toast · 2026-02-19T18:45:40 1771526740

I think we need to reevaluate what purpose these sorts of questions serve and why they're important in regards to judging intelligence.

The model getting it correct or not at any given instance isn't the point, the point is if the model ever gets it wrong we can still assume that it still has some semblance of stochasticity in its output, given that a model is essentially static once it is released.

Additionally, hey don't learn post training (except for in context which I think counts as learning to some degree albeit transient), if hypothetically it answers incorrectly 1 in 50 attempts, and I explain in that 1 failed attempt why it is wrong, it will still be a 1-50 chance it gets it wrong in a new instance.

This differs from humans, say for example I give an average person the "what do you put in a toaster" trick and they fall for it, I can be pretty confident that if I try that trick again 10 years later they will probably not fall for it, you can't really say that for a given model.

energy123 · 2026-02-19T18:56:36 1771527396

They're important but not as N=1. It's like cherry picking a single question from SimpleQA and going aha! It got it right! Meanwhile it's 8% lower score than some other model when evaluated on all questions.

buttered_toast · 2026-02-19T19:05:24 1771527924

Makes me wonder what people would consider better, a model that gets 92% of questions right 100% of the time, or a model that gets 95% of the questions right 90% of the time and 88% right the other 10%?

I think that's why benchmarking is so hard for me to fully get behind, even if we do it over say, 20 attempts and average it. For a given model, those 20 attempts could have had 5 incredible outcomes and 15 mediocre ones, whereas another model could have 20 consistently decent attempts and the average score would be generally the same.

We at least see variance in public benchmarks, but in the internal examples that's almost never the case.

1024core · 2026-02-19T17:29:59 1771522199

It's been hugged to death. I keep getting "Something went wrong".

1024core · 2026-02-18T18:15:25 1771438525

English version, but paywalled: https://www.nationalgeographic.com/history/history-magazine/...

(can read in FF's "reader mode").

Archive link: https://archive.ph/gsv8r

MrDresden · 2026-02-18T20:13:30 1771445610

It seems disabling JavaScript on that page also loads the full content.

1024core · 2026-02-17T01:48:11 1771292891

I'm wondering why they didn't cross reference the addresses they had from the furniture stores with those of registered sex offenders, as this abuser turned out to be? And further intersect that with "Flaming Alamo" brick houses??

alephnerd · 2026-02-17T02:06:26 1771293986

From TFA: "Initially Squire was ecstatic, expecting they could access a digitised customer list. But Harp broke the news that the sales records were just a "pile of notes" that went back decades."

1024core · 2026-02-17T01:45:47 1771292747

FTA:

> Squire works for US Department of Homeland Security Investigations in an elite unit which attempts to identify children appearing in sexual abuse material.

1024core · 2026-02-17T01:44:20 1771292660

Facial recognition is very powerful these days. My friend took a photo of his kid at the top of Twin Peaks in SF, with the city in the background. Unfortunately, due to the angle, you could barely see the eyes and a portion of the nose of the kid. Android was still able to tag the kid.

I feel like Facebook really dropped the ball here. It is obvious that Squire and colleagues are working for the Law Enforcement. If FB was concerned about privacy, they could have asked them to get a judicial warrant to perform a broad search.

But they didn't. And Lucy continued to be abused for months after that.

I hope when Zuck is lying on his death bed, he gets to think about these choices that he has made.

Gigachad · 2026-02-17T02:28:53 1771295333

Google photos has the advantage of a limited search space. Any photo you take is overwhelmingly likely to be one of the few faces already in the library. Not to say facebook couldn't solve the problem. But the ability of Google to do facial recognition with such poor inputs is that it's searching on 40~ faces rather than x billion faces.

fwipsy · 2026-02-17T02:34:18 1771295658

Can confirm, have seen Google photos misidentify strangers. I'm sure better technology exists, but Google's system has weaknesses.

Aurornis · 2026-02-17T02:15:54 1771294554

> I feel like Facebook really dropped the ball here

This story was from more than a decade ago.

Facebook had facial recognition after that, but they deleted it all in response to public outcry. It’s sad to see HN now getting angry at Facebook for not doing facial recognition.

> I hope when Zuck is lying on his death bed, he gets to think about these choices that he has made.

Are we supposed to be angry at Zuckerberg now for making the privacy conscious decision to drop facial recognition? Or is everyone just determined to be angry regardless of what they do?

tqi · 2026-02-17T05:34:22 1771306462

> Or is everyone just determined to be angry regardless of what they do?

People decide who they think are the good guys and who they think are the bad guys first, then view subsequent events through that lens.

alephnerd · 2026-02-17T01:51:55 1771293115

> I feel like Facebook really dropped the ball here

This case began being investigated on January 2014 [0], which means abuse began (shudder) in 2012-13 if not earlier.

Facebook/Meta only began rolling out DeepFace [1] in June 2015 [2]

Heck, VGG-Face wasn't released until 2015 [3] and Image-Based Crowd Counting only began becoming solvable in 2015-16.

> Facial recognition is very powerful these days.

Yes. But it is 2026, not 2014.

> I hope when Zuck is lying on his death bed, he gets to think about these choices that he has made

I'm sure there are plenty of amoral choices he can think about, but not solving facial detection until 2015 is probably not one of them.

---

While it feels like mass digital surveillance, social media, and mass penetration of smartphones has been around forever it only really began in earnest just 12 years ago. The past approximately 20 years (iPhone was first released on June 2007 and Facebook only took off in early 2009 after smartphones and mobile internet became normalized) have been one of the biggest leaps in technology in the past century. The only other comparable decades were probably 1917-1937 and 1945-1965.

---

[0] - https://www.bbc.co.uk/mediacentre/2026/bbc-eye-documentary-t...

[1] - https://research.facebook.com/publications/deepface-closing-...

[2] - https://www.cbsnews.com/news/facebook-can-recognize-you-just...

[3] - https://www.robots.ox.ac.uk/~vgg/data/vgg_face/

__loam · 2026-02-17T01:57:42 1771293462

Facebook rightly retired their facial recognition system in 2021 over concerns about user privacy. Facebook is a social media site, they are not the government or police.

Onavo · 2026-02-17T01:49:39 1771292979

When people on hacker News talk about requiring cops to do traditional police work instead of doing wide ranging trawls using technology, this is exactly what they meant. I hope you don't complain when the future you want becomes reality and the three letter agencies come knocking down your door just because you happened to be in the same building as a crime in progress and the machine learning algorithms determined your location via cellular logs and labelled you as a criminal.

hsbauauvhabzb · 2026-02-17T02:06:24 1771293984

There’s a pretty big difference between surveillance logging your every move your and scanning photos voluntarily uploaded to Facebook.

No, I don’t like Facebook using facial recognition technology, and no I don’t like that someone else can upload photos of me without my consent (which ironically could leverage facial recognition technology to blanket prevent), but these are other technical and social issues that are unrelated to the root issue. I also wish there were clear political and legal boundaries around surveillance usage for truly abhorrent behaviour versus your non-Caucasian neighbour maybe j -walking triggering a visit from ICE.

Yes, it’s an abuse of power for these organisations to collect data these ways, but I’m not against their use to prevent literal ongoing child abuse, it’s one of the least worst uses of it.

wat10000 · 2026-02-17T02:13:47 1771294427

The grim meathook future of ubiquitous surveillance is coming regardless. At the very least we could get some proper crime solving out of it along the way.

vasco · 2026-02-17T05:23:11 1771305791

That's probably the worst attitude one could have about this topic in the whole space of possible opinions there is.

EagnaIonat · 2026-02-17T04:22:48 1771302168

The EU AI act activates this year. Facial recognition is in the restrictive list. You don't want to give auditors ammunition before it goes live as top fine would cost FB around $4B, and wouldn't be a one time fine.

Even if only law enforcement can use it, having that feature is highly regulated.

[edit] I see this is from years ago. I should read the articles first. :)

belorn · 2026-02-17T02:54:32 1771296872

I would hazard a guess that the facial recognition will limit the search scope to people associated (to some degree) with your friends account and some threshold of metrics gathered from the image. I doubt it is using a broad search.

With billions of accounts, the false positive rate of facial recognition when matching against every account would likely make the result difficult to use. Even limiting to a single country like UK the number could be extremely large.

Let say there is a 0.5% false positive rate and some amount of false negatives. With 40 million users, that would be 200 000 false positives.

vasco · 2026-02-17T05:21:10 1771305670

The only explanation for this comment is you never used reverse image search by Google or yandex before it was nerfed or you'd know this is super plausible to find direct hits without many false positives.

LightBug1 · 2026-02-17T10:02:12 1771322532

Facebook carried a ball?

I'm willing to bet said ball was kicked into the jungle five seconds after registering the domain.

1024core · 2026-02-04T18:42:07 1770230527

I don't understand one thing: why would the Japanese government maintain a ZIRP or a NIRP ? What do they have to gain by doing so?

johnvanommen · 2026-02-04T18:49:23 1770230963

It’s to control inflation.

Basically, when currency is scarce, its value goes UP.

When currency is plentiful, its value goes DOWN.

The first scenario lowers inflation, the second raises it.

After Japans bubble economy popped in the early 90s, they had asset values FALL.

So the BoJ began stimulating the economy - trying to push UP inflation - by adding currency to the markets.

The Carry Trade illustrates one of the dangers:

Japan was trying to stimulate their own economy, to counteract the deflation caused by their bubble popping.

But money doesn’t know borders, and though the money was intended to stimulate JAPANS economy, there was nothing stopping ANYONE from purchasing that currency. It’s not like you have to live in Japan to buy Yen.

So the money (yen) was created in Japan, but ended up all over the world.

This has consequences:

* Japan ended up with mountains of US dollars. This is one of the reasons that Japan has more US Treasuries than China. This mountain of dollars lowers YOUR cost of living. Because USD is being acquired for The Carry Trade. This creates artificial demand for USD.

* Because the yen is created in Japan but is then used for international commerce, it dramatically reduces the inflation that “printing money” would normally create. This is why Japan has more debt per capita than any country by far, by a factor of over 2X

I am just an I.T. dude who invests in real estate. So what I just posted may be completely wrong.

The carry trade has existed for about four decades; that’s my summary of how it affects us, from the perspective of a small time real estate guy.

pjc50 · 2026-02-05T13:37:29 1770298649

We're a bit far down in the thread, but I'd be interested in knowing why this alleged outflow of cheap yen didn't keep pushing the currency down. It's been flattish in the 100-150 band for decades.

I had to fact check the Japan having most treasuries, it's true, see this chart: https://economicsinsider.com/top-15-largest-us-treasury-hold...

As an aside, the presence of "Luxembourg" and "Grand Cayman" on that chart is a bit of a tipoff as to how the global economy works.

lbrito · 2026-02-04T18:44:43 1770230683

Probably to stimulate the economy which has been stagnant in terms of GDP since the 90s

jsutter909 · 2026-02-04T18:51:03 1770231063

Because it means they can borrow for free

johnvanommen · 2026-02-04T19:08:48 1770232128

Everything has consequences.

Free money is never free.

georgeecollins · 2026-02-04T19:34:06 1770233646

One cost is to the savings of Japanese people who don't get a competitive rate of return on their savings. They save a lot and generally don't invest abroad.

1024core · 2026-02-04T21:17:02 1770239822

If their own government is giving them a 0 interest rate, why are they not investing abroad?

worik · 2026-02-04T19:26:43 1770233203

Deflation.

1024core · 2026-02-04T16:20:46 1770222046

Exactly. Given that most birds weigh less than 4 pounds and bird strikes can take an engine out completely.

Plus, if @jacquesm says it, it must be true. ;-)

1024core · 2025-09-26T19:41:26 1758915686

This reminds me of the "trash can fiasco" that went down in San Fracnsico.

https://sfpublicworks.org/trashcanredesign

TL;DR: San Francisco government decided to go with custom-designed, bespoke, artisanal public trash cans. Each can ended up coming in at around $20K.

When, in fact, if you buy a typical run-of-the-mill public trash can that most other cities do, it would cost them less than $1000.

avree · 2025-09-26T19:43:08 1758915788

You are conflating two things with that story. The prototypes cost $20,000. The designed can cost $3,000. Higher than your "$1,000" can, but it also had a bunch of "features". If you've ever worked at a hardware company, you probably know that the price of DVT units, or any prototype, ends up being significantly higher than the production unit.

1024core · 2025-09-26T20:36:30 1758918990

So you're saying the designed can cost 3X the COTS one. Similar idea to the story, no?