Hacker Newsnew | past | comments | ask | show | jobs | submit | haffi112's commentslogin

What are your favourite skills?

The skills that matter most to me are the ones I create myself (with the skill creator skill) that are very specific and proprietary. For instance, a skill on how to write a service in my back-testing framework.

I do also like to make skills on things that are more niche tools, like marimo (a very nice jupyter replacement). The model probably does known some stuff about it, but not enough, and the agent could find enough online or in context7, but it will waste a lot of time and context in figuring it out every time. So instead I will have a deep thinking agent do all that research up front and build a skill for it, and I might customize it to be more specific to my environment, but it's mostly the condensed research of the agent so that I don't need to redo that every time.


A very particular set of skills.

nunchuck skills

The only skill that matters


Even if it was close to being near perfect, that is still not enough due to the negative impact of false positive detections on students.


That's where the humanizers come in. These are solutions that take LLM generated text and make it sound human written to avoid detection.

The principle of training them is quite simple. Take an LLM and reward it for revising text so that it doesn't get detected. Reinforcement learning takes care of the rest for you.


Why do the news need to make everything sound sensational. Let's move on.


The section on nested ifs reminded be of being a Nevernester: https://www.youtube.com/watch?v=CFRhGnuXG-4 (it's a short [~8 min], fun watch)


It makes it look like the presentation is rushed or made last minute. Really bad to see this as the first plot in the whole presentation. Also, I would have loved to see comparisons with Opus 4.1.

Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.


> like the presentation is rushed or made last minute

Or written by GPT-5?


They never compare with other vendors


You would think that Springer did the due diligence here, but what is the value of a brand such as Springer if they let these AI slops through their cracks?

This is an opportunity for brands to sell verifiability, i.e., that the content they are selling has been properly vetted, which was obviously not the case here.


Back when I was doing academic publishing I'd use a regex to find all the hyperlinks, then a script (written by a co-worker, thanks again Dan!) to determine if they were working or no.

A similar approach should work w/ a DOI.


In the past I've had GPT4 output references with valid DOIs. Problem was the DOIs were for completely different (and unrelated) works. So you'd need to retrieve the canonical title and authors for the DOI and cross check it.


A classic case.

I work on Veracity https://groundedai.company/veracity/ which does citation checking for academic publishers. I see stuff like this all the time in paper submissions. Publishers are inundated


Don’t publishers ban authors who attempt such shenanigans?


And then make sure the arguments and evidence it presents are as the LLM represented them to be.


At which point it’s more of a hassle to use an LLM than not.


And then check that the cited article was not itself an AI piece that managed to get published.


Not all journals require a DOI link for each reference. Most good ones do seem to have a system to verify the reference exists and is complete; I assume there’s some automation to that process but I’d love to hear from journal editorial staff if that’s really the case.


Why would one think that? All of the big journal publishers have had paper millers and fraudsters and endless amounts of "tortured phrases" under their names for a long, long time.


Taurine deficiency has been claimed to be a driver of aging [1]. The claim from the news article about it possibly being related to cancer seems like it needs a much stronger justification.

[1] https://www.science.org/doi/10.1126/science.abn9257


How are those two things related? Both can be true, neither can be true, either can be true - there is no relationship.


(watching live) I'm wondering how it performs on the METR benchmark (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: