More

haffi112 · 2025-12-20T11:00:57 1766228457

What are your favourite skills?

frankc · 2025-12-20T14:33:30 1766241210

The skills that matter most to me are the ones I create myself (with the skill creator skill) that are very specific and proprietary. For instance, a skill on how to write a service in my back-testing framework.

I do also like to make skills on things that are more niche tools, like marimo (a very nice jupyter replacement). The model probably does known some stuff about it, but not enough, and the agent could find enough online or in context7, but it will waste a lot of time and context in figuring it out every time. So instead I will have a deep thinking agent do all that research up front and build a skill for it, and I might customize it to be more specific to my environment, but it's mostly the condensed research of the agent so that I don't need to redo that every time.

dmd · 2025-12-20T13:30:19 1766237419

A very particular set of skills.

pylotlight · 2025-12-20T11:58:06 1766231886

nunchuck skills

not_a_toaster · 2025-12-20T13:09:56 1766236196

The only skill that matters

haffi112 · 2025-11-12T09:34:22 1762940062

You can find a demo here: https://huggingface.co/spaces/facebook/omniasr-transcription...

haffi112 · 2025-10-27T18:53:46 1761591226

Even if it was close to being near perfect, that is still not enough due to the negative impact of false positive detections on students.

haffi112 · 2025-10-27T18:52:06 1761591126

That's where the humanizers come in. These are solutions that take LLM generated text and make it sound human written to avoid detection.

The principle of training them is quite simple. Take an LLM and reward it for revising text so that it doesn't get detected. Reinforcement learning takes care of the rest for you.

haffi112 · 2025-09-14T21:07:30 1757884050

Why do the news need to make everything sound sensational. Let's move on.

haffi112 · 2025-08-30T18:23:33 1756578213

The section on nested ifs reminded be of being a Nevernester: https://www.youtube.com/watch?v=CFRhGnuXG-4 (it's a short [~8 min], fun watch)

haffi112 · 2025-08-07T17:28:25 1754587705

It makes it look like the presentation is rushed or made last minute. Really bad to see this as the first plot in the whole presentation. Also, I would have loved to see comparisons with Opus 4.1.

Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.

danpalmer · 2025-08-07T18:06:22 1754589982

> like the presentation is rushed or made last minute

Or written by GPT-5?

herval · 2025-08-08T02:30:30 1754620230

They never compare with other vendors

haffi112 · 2025-07-09T08:51:48 1752051108

You would think that Springer did the due diligence here, but what is the value of a brand such as Springer if they let these AI slops through their cracks?

This is an opportunity for brands to sell verifiability, i.e., that the content they are selling has been properly vetted, which was obviously not the case here.

WillAdams · 2025-07-09T10:47:12 1752058032

Back when I was doing academic publishing I'd use a regex to find all the hyperlinks, then a script (written by a co-worker, thanks again Dan!) to determine if they were working or no.

A similar approach should work w/ a DOI.

RossBencina · 2025-07-09T11:06:34 1752059194

In the past I've had GPT4 output references with valid DOIs. Problem was the DOIs were for completely different (and unrelated) works. So you'd need to retrieve the canonical title and authors for the DOI and cross check it.

cyclecycle · 2025-07-09T14:29:34 1752071374

A classic case.

I work on Veracity https://groundedai.company/veracity/ which does citation checking for academic publishers. I see stuff like this all the time in paper submissions. Publishers are inundated

rbanffy · 2025-07-11T20:47:49 1752266869

Don’t publishers ban authors who attempt such shenanigans?

thoroughburro · 2025-07-09T12:02:17 1752062537

And then make sure the arguments and evidence it presents are as the LLM represented them to be.

ofjcihen · 2025-07-09T16:44:48 1752079488

At which point it’s more of a hassle to use an LLM than not.

rixed · 2025-07-10T05:20:59 1752124859

And then check that the cited article was not itself an AI piece that managed to get published.

bumby · 2025-07-09T13:56:47 1752069407

Not all journals require a DOI link for each reference. Most good ones do seem to have a system to verify the reference exists and is complete; I assume there’s some automation to that process but I’d love to hear from journal editorial staff if that’s really the case.

cess11 · 2025-07-09T10:21:43 1752056503

Why would one think that? All of the big journal publishers have had paper millers and fraudsters and endless amounts of "tortured phrases" under their names for a long, long time.

haffi112 · 2025-05-17T16:53:41 1747500821

Taurine deficiency has been claimed to be a driver of aging [1]. The claim from the news article about it possibly being related to cancer seems like it needs a much stronger justification.

[1] https://www.science.org/doi/10.1126/science.abn9257

mmooss · 2025-05-17T17:11:31 1747501891

How are those two things related? Both can be true, neither can be true, either can be true - there is no relationship.

haffi112 · 2025-05-16T15:13:12 1747408392

(watching live) I'm wondering how it performs on the METR benchmark (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...).