This is the part of the story that interests me the most:
> The rapid shift by so many businesses and people to install their own panels and batteries is causing headaches for Eskom, the already troubled utility.
> Every kilowatt generated by privately owned solar installations is a hit to its bottom line. Eskom’s coal-burning plants, which provide most of South Africa’s power, are old and in poor shape.
> even Ms. Graham-Maré, the deputy electricity minister, installed a solar system in her home. Her energy bill, she said, fell by two-thirds.
> Multiply her hack by the thousands and you have what South Africans call Eskom’s “death spiral.” Well-off customers lower their bills with solar, which causes Eskom to lose money, which in turn forces Eskom to raise prices and encourages more people to install solar.
> Now, unable to beat solar, Eskom is joining solar.
> The utility has removed onerous licensing requirements on private installations. It has allowed people to sell power to the grid. And it has tweaked its rates so that customers pay a fixed charge in addition to the cost of any power they consume. Essentially, people pay simply to be connected to the grid, a standard feature in other nations that’s new in South Africa.
It seems like they are trying to pivot to another stable equilibrium, which if it happens is really hopeful. Because imo such a pivot is long overdue.
A very similar thing is happening in Pakistan. Net electricity demand was actually going down in 2025. Which makes sense because electricity rates were brutal because of extreme mismanagement.
At least part of me feels this sort of primal joy you get when an ineffective drain on society is forced to get its shit together instead of sitting around digging its heels. Pakistan's utilities were definitely an example of not doing enough.
I'm certain there are problems but I'm sure to many it feels like declogging a long stuffed nose.
It's something extremely pervasive in modern design language.
It actually infuriates me to no end. There are many many many instances where you should use numbers but we get vague bullshit descriptions instead.
My classic example is that Samsung phones show charging as Slow, Fast, Very fast, Super fast charging. They could just use watts like a sane person. Internally of course everything is actually watts and various apps exist to report it.
Another example is my car shows motor power/regen as a vertical blue segmented bar. I'm not sure what the segments are supposed to represent but I believe its something like 4kW or something. If you poke around you can actually see the real kW number but the dash just has the bar.
Another is WiFi signal strength which the bars really mean nothing. My router reports a much more useful dBm measurement.
Thank god that there are lots of legacy cases that existed before the iPhone-ized design language started taking over and are sticky and hard to undo.
I can totally imagine my car reporting tire pressure as low or high or some nonsense or similarly I'm sure the designers at YouTube are foaming at the mouth to remove the actual pixel measurements from video resolutions.
It's all rather dumb, but your examples are really counterexamples, because a watt is sadly not something most people understand. One would at minimum need to have passed a physics class, and even that doesn't necessarily leave a person with an intuitive, visceral understanding of what a watt is, feels like, can do. I appreciate my older Samsung phone that just converts it into expected time until full charge. That's the number that matters to me anyway, and I can make my own value judgment about how "super" the fastness is. But I do agree with your point and would be pissed if they dumbed it down to Later, Soon, Very Soon and Super Soon.
Speaking of time and timestamps, which I would've thought were straightforward, I get irked to see them dumbed-down to "ago" values e.g. an IM sent "10 minutes ago" or worse "a day ago." Like what time of day, a day ago?
And just through exposure over time they'd learn "my phone usually charges around X" and be able to see if their new cable is actually charging faster or not.
In US, washing machines have "cold", "warm", "hot" settings. In Europe, you have a temperature knob "30C", "40C", "60C".
Like you, I don't buy the argument that people are actually too dumb to deal with the latter or are allergic to numbers. People get used to and make use of numbers in context naturally if you expose them.
I have a machine which has cold/warm/hot because it doesn't heat water by itself, it just takes whatever hot water there exists in the house (and "warm" means 50% hot water and 50% cold).
I still think anyone who grew up with such a machine would be able to graduate to a numerical temp knob without having a visceral reaction over the numbers every time they do laundry.
A good hybrid can do very well. Presumably by keeping the engine in exactly its sweet spot and designing aggressively for that. BYD for example claims 46% thermal efficiency. [0]
That’s for future unreleased capabilities and models, not the model released today.
They did the same thing for gpt-5.1-codex-max (code name “arcticfox”), delaying its availability in the API and only allowing it to be used by monthly plan users, and as an API user I found it very annoying.
You are effectively describing SimpleQA but with a single question instead of a comprehensive benchmark and you can note the dramatic increase in performance there.
It has a SimpleQA score of 69%, a benchmark that tests knowledge on extremely niche facts, that's actually ridiculously high (Gemini 2.5 *Pro* had 55%) and reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model.
I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.
I’m amazed by how much Gemini 3 flash hallucinates; it performs poorly in that metric (along with lots of other models). In the Hallucination Rate vs. AA-Omniscience Index chart, it’s not in the most desirable quadrant; GPT-5.1 (high), opus 4.5 and 4.5 haiku are.
Can someone explain how Gemini 3 pro/flash then do so well then in the overall Omniscience: Knowledge and Hallucination Benchmark?
Hallucination rate is hallucination/(hallucination+partial+ignored), while omniscience is correct-hallucination.
One hypothesis is that gemini 3 flash refuses to answer when unsuure less often than other models, but when sure is also more likely to be correct. This is consistent with it having the best accuracy score.
I'm a total noob here, but just pointing out that Omniscience Index is roughly "Accuracy - Hallucination Rate". So it simply means that their Accuracy was very high.
> In the Hallucination Rate vs. AA-Omniscience Index chart, it’s not in the most desirable quadrant
This doesn't mean much. As long as Gemini 3 has a high hallucination rate (higher than at least 50% others), it's not going to be in the most desirable quadrant by definition.
For example, let's say a model answers 99 out of 100 questions correctly. The 1 wrong answer it produces is a hallucination (i.e. confidently wrong). This amazing model would have a 100% hallucination rate as defined here, and thus not be in the most desirable quadrant. But it should still have a very high Omniscience Index.
I'm confused about the "Accuracy vs Cost" section. Why is Gemini 3 Pro so cheap? It's basically the cheapest model in the graph (sans Llama 4 and Mistral Large 3) by a wide margin, even compared to Gemini 3 Flash. Is that an error?
It's not an error, Gemini 3 Pro is just somehow able to complete the benchmark while using way fewer tokens than any other model. Gemini 3 Flash is way cheaper per token, but it also tends to generate a ton of reasoning tokens to get to its answer.
They have a similar chart that compares results across all their benchmarks vs. cost and 3 Flash is about half as expensive as 3 Pro there despite being four times cheaper per token.
> reflects either training on the test set or some sort of cracked way to pack a ton of parametric knowledge into a Flash Model
That's what MoE is for. It might be that with their TPUs, they can afford lots of params, just so long as the activated subset for each token is small enough to maintain throughput.
It's strange to me too, but they must have done the market research for what people do with image gen.
My own main use cases are entirely textual: Programming, Wiki, and Mathematics.
I almost never use image generation for anything. However its objectively extremely popular.
This has strong parallels for me to when snapchat filters became super popular. I know lots of people loved editing and filtering pictures but I always left everything as auto mode, in fact I'd turn off a lot of the default beauty filters. It just never appealed to me.
It makes a big difference. We'll sorted garbage is easier to deal with.
I watched this video from Andrew Fraser on Indonesias plastic recycling industry. There were a few points during the documentary where this is pointed out. I had gemini point them out and verified them.
---
The documentary indicates that separating rubbish bins at the source is important because it eliminates an entire process and makes almost everything recyclable (14:18 - 14:24).
The speaker contrasts the Indonesian system, where scavengers sort mixed waste, with Western systems where waste is separated at the source (2:00 - 2:08, 6:57 - 7:00). At a modern processing facility, the speaker notes that if waste is not separated at the source, some material becomes too dirty to recycle (14:26 - 14:29, 20:26 - 20:29).
Furthermore, the video highlights that imported plastics from Western nations are highly valuable because they are clean, dry, sorted, and high-grade, having gone directly into the recycling side of consumer bins (28:57 - 29:11). This high-quality imported plastic is essential for Indonesian recycling plants like PMS to mix with lower-quality local waste, allowing them to process more raw domestic waste and create more jobs (28:01 - 28:27).
Sorting, and charging different prices for different types of waste (typical in Europe, don’t know elsewhere), also provides economic and psychological incentives to potentially reduce purchases of certain type.
A shocking number of people are so well below mediocre that its kind of amazing how okayish we get by even pre AI. Makes me thing there is more robustness than you might expect given terrible numbers.
For example what seemed crazy to me that as a country Greece somehow had and still has ~half of their households *primary* source of income being pensions.
> The rapid shift by so many businesses and people to install their own panels and batteries is causing headaches for Eskom, the already troubled utility.
> Every kilowatt generated by privately owned solar installations is a hit to its bottom line. Eskom’s coal-burning plants, which provide most of South Africa’s power, are old and in poor shape.
> even Ms. Graham-Maré, the deputy electricity minister, installed a solar system in her home. Her energy bill, she said, fell by two-thirds.
> Multiply her hack by the thousands and you have what South Africans call Eskom’s “death spiral.” Well-off customers lower their bills with solar, which causes Eskom to lose money, which in turn forces Eskom to raise prices and encourages more people to install solar.
> Now, unable to beat solar, Eskom is joining solar.
> The utility has removed onerous licensing requirements on private installations. It has allowed people to sell power to the grid. And it has tweaked its rates so that customers pay a fixed charge in addition to the cost of any power they consume. Essentially, people pay simply to be connected to the grid, a standard feature in other nations that’s new in South Africa.
It seems like they are trying to pivot to another stable equilibrium, which if it happens is really hopeful. Because imo such a pivot is long overdue.
A very similar thing is happening in Pakistan. Net electricity demand was actually going down in 2025. Which makes sense because electricity rates were brutal because of extreme mismanagement.
At least part of me feels this sort of primal joy you get when an ineffective drain on society is forced to get its shit together instead of sitting around digging its heels. Pakistan's utilities were definitely an example of not doing enough.
I'm certain there are problems but I'm sure to many it feels like declogging a long stuffed nose.
reply