Hacker Newsnew | past | comments | ask | show | jobs | submit | pietz's commentslogin

Do we know if this is better than Nvidia Parakeet V3? That has been my go-to model locally and it's hard to imagine there's something even better.


I'm so amazed to find out just how close we are to the start trek voice computer.

I used to use Dragon Dictation to draft my first novel, had to learn a 'language' to tell the rudimentary engine how to recognize my speech.

And then I discovered [1] and have been using it for some basic speech recognition, amazed at what a local model can do.

But it can't transcribe any text until I finish recording a file, and then it starts work, so very slow batches in terms of feedback latency cycles.

And now you've posted this cool solution which streams audio chunks to a model in infinite small pieces, amazing, just amazing.

Now if only I can figure out how to contribute to Handy or similar to do that Speech To Text in a streaming mode, STT locally will be a solved problem for me.

[1] https://github.com/cjpais/Handy



Happy to answer questions about this (or work with people on further optimizing the open source inference code here). NVIDIA has more inference tooling coming, but it's also fun to hack on the PyTorch/etc stuff they've released so far.

Thank you for sharing! Does your implementation allow running the Nemotron model on Vulkan? Like whisper.cpp? I'm curious to try other models, but I don't have Nvidia, so my choices are limited.

I’m curious about this too. On my M1 Max MacBook I use the Handy app on macOS with Parakeet V3 and I get near instant transcription, accuracy slightly less than slower Whisper models, but that drop is immaterial when talking to CLI coding agents, which is where I find the most use for this.

https://github.com/cjpais/Handy


I've been using Parakeet V3 locally and totally ancedotaly this feels more accurate but slightly slower

I liked Parakeet v3 a lot until it started to drop whole sentences, willy-nilly.

Yeah, I think the multilingual improvements in V3 caused some kind of regression for English - I've noticed large blocks occasionally dropped as well, so reverted to v2 for my usage. Specifically nvidia/parakeet-tdt-0.6b-v2 vs nvidia/parakeet-tdt-0.6b-v3

I didn’t see that but I do get a lot of stutters (words or syllables repeated 5+ times), not sure if it’s a model problem or post processing issue in the Handy app.

Oh god am I glad to read this. Thought it was my microphone or something.

Parakeet is really good imo too, and it's just 0.6B so it can actually run on edge devices. 4B is massive, I don't see Voxtral running realtime on an Orin or fitting on a Hailo. An Orin Nano probably can't even load it at BF16.

Came here to ask the same question!

Why not just use it in the terminal? That's literally what it was built for.

Isn't it obvious that an agent will do better if he internalizes the knowledge on something instead of having the option to request it?

Skills are new. Models haven't been trained on them yet. Give it 2 months.


Not so obvious, because the model still needs to look up the required doc. The article glances over this detail a little bit unfortunately. The model needs to decide when to use a skill, but doesn’t it also need to decide when to look up documentation instead of relying on pretraining data?


Removing the skill does remove a level of indirection.

It's a difference of "choose whether or not to make use of a skill that would THEN attempt to find what you need in the docs" vs. "here's a list of everything in the docs that you might need."


I believe the skills would contain the documentation. It would have been nice for them to give more information on the granularity of the skills they created though.


This idea/execution isn't new right? Can someone explain what makes this different/better? Is this the ublock Origin of cookie banner hiders?


It goes through the "reject all tracking" flow. Other solutions automate clicking "accept all tracking" (since that's usually simpler), or just hide the pop-ups.


Honest question: Why would I use Claude with OpenCode if I have a Claude Max subscription? Why not Claude Code?


Uhm, yes that's why you rely on LMArena (core) results only to judge the answering style and structure. I thought this was common knowledge.


2006: "This meeting could have been an e-mail"

2026: "This app could have been a prompt"


c'mon


Hold up, so d0 is just Claude with access to the terminal?

Sounds like they just discovered that they don't have a product.


Weirdly, I find a higher signal to noise in this analogy than looking at benchmarks these days.

If you let your inner fanboy rest for a moment you realize Gemini 3, Claude Opus 4.5, and GPT 5.2 are all amazing. If two of them disappeared tomorrow, my AI assisted productiveness wouldn't change.

The 3% difference on benchmark X doesn't mean anything anymore. It's probably more helpful to compare them on character traits instead of numbers.

My one word to describe Claude would be "pleasant". It's just so nice to communicate with. GPT/Codex would be the "thorough". It finds and thinks of stuff the others don't. For Gemini 3, the jury is still out. It might be the smart kid on the block that's still a bit rough around the edges, but given that it's a preview things might change soon.


Mine definitely would. This sounds so clichéd, but Claude (Opus 4.5, but also the others) just "gets how I think" better. I've tried Gemini 3 and GPT 5.2 and didn't like them at all -- not when I know I can have Claude. I mostly code Python + Django, so it could also be from that.

Gemini 3 has this extremely annoying habit of bleeding its reasoning process onto comments which are hard to read and not very human-like (they're not "reasoning", they're "question for the sake of questioning", which I get as a part of the process, but not as a comment in the code!). I've seen it do things like these many times:

    # Because so and so and so and so we must do x(param1=True, param2=False)
    # Actually! No, wait! It is better if we do x(param1=True, param2=True)
    x(param1=True, param2=True, param3=False) # This one is even better!
Beyond that, it just does not produce what I consider good python code. I daily-drove Gemini 2.5 before I realized how good Anthropic's models were (or perhaps before they punched back after 2.5?) and haven't been able to go back.

As for GPT 5.2, I just feel like it doesn't really follow my instructions or way of thinking. Like it's dead set on following whatever best practices it has learned, and if I disagree with them, well tough luck. Plus, and I have no better way of saying this, it's just rude and cold, and I hate it for it.


I recently discovered Claude, and it does much better than Codex or Gemini for python code.

Gemini seems to lean to making everything a script, disconnected from the larger vision. Sure, it uses our existing libraries, but the files it writes and functions it makes can’t be integrated back in.

Codex is fast. Very fast. Which makes it great for a conversational UI, and answering questions about the codebasw or proposing alternatives but when it writes code it’s too clever. The code is valid but not pythonic. Like the invention of one line functions just to optimize a situation that had could be parameterized in three places.

Claude on the other hand makes code that is simple to understand and has enough architecture that you can lift it out and use as is without too much rewriting.


A bit of a missed opportunity not to use the JSON Resume schema for this.

https://jsonresume.org/schema


We deliberately chose not to use JSON Resume because we wanted greater flexibility. For example, in RenderCV, you can use any section title you want and place any of the 9 available entry types under any section. In contrast, JSON Resume has predefined section titles, and each section is restricted to a predefined entry type. For instance, you must use the experience entry schema under the experience section.


I hear you. This boils town to personal opinion. I would have preferred to use an existing standard than introducing yet another one. The custom sections aren't something I've ever seen or needed anyway.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: