Hacker Newsnew | past | comments | ask | show | jobs | submit | Narciss's commentslogin

I opened it in the morning while my partner was sleeping next to me.

I see the big red “press me” button sitting in a corner, and when I pressed, a rich farting sound blasted out of my phone’s speakers, which coincidentally were on full volume. It was LOUD enough to hear it throughout the house.

My partner’s only reaction was to say “great” in a low voice :)))

Thanks for the laugh!


I’m all for building weird stuff. Unfortunately, I have absolutely no idea what this is so can’t use it, but it was fun to try and figure out.


No IT issues thank god! Some work on my side proj maybe to unwind.

Merry xmas everyone!


LOL


Context7 might be good for you


Just curious, wouldn't it be easier to download the docs in a format that is searchable for the LLM? A MCP for this seems overkill to me.


It's just convenience really. Context7 takes care of keeping _all_ the documentation available and provides a search function.

You can definitely have it locally or even build a RAG/MCP -thing just for the specific docs you want.


Definetly! I put in the instructions.md file to check if the code is well conform to the latest doc using Context7, works quite well!


Here we go again....


I've been working with it, and so far it's been very impressive. Better than Opus in my feels, but I have to test more, it's super early days


What I usually try to test with is try to get them do full scalable SaaS application from scratch... It seemed very impressive in how it did the early code organization using Antigravity, but then at some point, all of sudden it started really getting stuck and constantly stopped producing and I had to trigger continue, or babysit it. I don't know if I could've been doing something better, but that was just my experience. Seemed impressive at first, but otherwise at least vs Antigravity, Codex and Claude Code scale more reliably.

Just early anecdote from trying to build that 1 SaaS application though.


It sounds like an API issue more than anything. I was working with it through cursor on a side project, and it did better than all previous models at following instructions, refactoring, and UI-wise it has some crazy skills.

What really impressed me was when I told it that I wanted a particular component’s UI to be cleaned up but I didn’t know how exactly, just wanted to use its deep design expertise to figure it out, and it came up with a UX that I would’ve never thought of and that was amazing.

Another important point is that the error rate for my session yesterday was significantly lower than when I’ve used any other model.

Today I will see how it does when I use it at work, where we have a massive codebase that has particular coding conventions. Curious how it does there.


thx


This definitely is ai generated LOL


> All participating organizations then generated responses to each question from each of the four AI assistants. This time, we used the free/consumer versions of ChatGPT, Copilot, Perplexity and Gemini. Free versions were chosen to replicate the default (and likely most common) experience for users. Responses were generated in late May and early June 2025.

First of all, none of the SOTA models we're currently using were released in May and early June. Gemini 2.5 came out in June 17, GPT 5 & Claude Opus 4.1 at the beginning of August.

On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.

You have to use the right tools for the right job, and any report that is more than a month old is useless in the AI world at this point in time, beyond a snapshot of how things 'used to be'.


Ah, the "you're using the wrong model" fallacy (is there a name for this?)

In the eyes of the evangelists, every major model seems to go from "This model is close to flawless at this task, you MUST try this TODAY" to "It's absolutely wild that anyone would ever consider using such a no-good, worthless model for this task" over the course of a year or so. The old model has to be re-framed for the new model to look more impressive.

When GPT-4 was released I was told it was basically a senior-level developer, now it's an obviously worthless model that you'd be a fool to use to write so much as a throwaway script.


Not an evangelist for AI at all, I just love it as a tool for my creativity, research and coding.

What I’m saying is that there should be a disclaimer: hey, we’re testing these models for the average person, that have no idea about AI. People who actually know AI would never use them in this way.

A better idea: educate people. Add “Here’s the best way to use them btw…” to the report.

All I’m saying is, it’s a tool, and yes you can use it wrong. That’s not a crazy realization. It applies to every other tool.

We knew that the hallucation rate for gpt 4o was nuts. From the start. We also know that gpt-5 has a much lower hallucination rate. So there are no surprises here, I’m not saying anything groundbreaking, and neither are they.


> On top of that, to use free models for anything like this is absolutely wild. I use the absolute best models, and the research versions of this whenever I do research. Anything less is inviting disaster.

"I contend we are both atheists, I just believe in one fewer god than you do. When you understand why you dismiss all the other possible gods, you will understand why I dismiss yours." - Stephen F Roberts


It ain’t a God, it’s a tool.

One knife does not cut potatoes. Doesn’t mean that all knives don’t cut potatoes. Use the right tool for the job.

Though I do love a well placed quote


If they used a paid version, their study would not represent how most people use AI (with the free version)


But they’re using a free version that’s not even out there anymore. This is my problem - it came out already dated.


> to use free models for anything like this is absolutely wild

It would be wild if they’d use anything else, because the free models are what most people use, and the concern is on how AI influences the general population.


I think you are missing the point: it's mainly to highlight that the models that most people use, i.e. free versions with default settings, output a large number of factual errors, even when they are asked to base their answer to specific sources of information (as it's explained in their methodology document).


Is it true of the latest free models? Just saying that the report started already dated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: