Hacker Newsnew | past | comments | ask | show | jobs | submit | mdrzn's commentslogin

Seems crazy to me that yCombinator is funding these OpenClaw wrappers.. hype based on hype.

Is it 0.003 per minute of audio uploaded, or "compute minute"?

For example fal.ai has a Whisper API endpoint priced at "$0.00125 per compute second" which (at 10-25x realtime) is EXTREMELY cheaper than all the competitors.


I think the point is having it for real-time; this is for conversations rather than transcribing audio files.

That quote was for the non-realtime model.

It can actually go much lower. Gemini costs around $0.01/hour of transcription last time I checked.

Both AWS and Mistral prices above are per minute of input audio.

If Voxtral can process rapid speech as well as it claims to, an obvious cost optimization would be to speed up normal laconic speech to the maximum speed the model can handle accurately.

There's no comparison to Whisper Large v3 or other Whisper models..

Is it better? Worse? Why do they only compare to gpt4o mini transcribe?


WER is slightly misleading, but Whisper Large v3 WER is classically around 10%, I think, and 12% with Turbo.

The thing that makes it particularly misleading is that models that do transcription to lowercase and then use inverse text normalization to restore structure and grammar end up making a very different class of mistakes than Whisper, which goes directly to final form text including punctuation and quotes and tone.

But nonetheless, they're claiming such a lower error rate than Whisper that it's almost not in the same bucket.


On the topic of things being misleading, GPT-4o transcriber is a very _different_ transcriber to Whisper. I would say not better or worse, despite characterizations such. So it is a little difficult to compare on just the numbers.

There's a reason that quite a lot of good transcribers still use V2, not V3.


Different how?

Gpt4o mini transcribe is better and actually realtime. Whisper is trained to encode the entire audio (or at least 30s chunks) and then decode it.

So "gpt4o mini transcribe" is not just whisper v3 under the hood? Btw it's $0.006 / minute

For Whisper API online (with v3 large) I've found "$0.00125 per compute second" which is the cheapest absolute I've ever found.


>So it's not just whisper v3 under the hood?

Why it should be Whisper v3? They even released an open model: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...


Deepinfra offers Whisper V3 at 0.00045$ / minute of transcribed audio.

The linked article claims the average word error rate for Voxtral mini v2 is lower than GPT-4o mini transcribe

Gpt4o mini transcribe is better than whisper, the context is the parent comment.

Maybe it's because I'm not used to the flow, but I prefer to work directly on the machine where I'm logged in via ssh, instead of working "somewhere in a git tree", and then have to deploy/test/etc.

Once this app (or a similar app by Anthropic) will allow me to have the same level of "orchestration" but on a remote machine, I'll test it.


Not going to solve your exact problem but I started this project with this approach in mind

https://github.com/jgbrwn/vibebin


"we're releasing our new model" is it downloadable and runnable in local? Could I create a "vTuber" persona with this model?

We have not released the weights, but it is fully available to use in your websites or applications. I can see how our wording there could be misconstrued -- sorry about that. You can absolutely create a vTuber persona. The link in the post is still live if you want to create one (as simple as uploading an image, selecting a voice, and defining the personality). We even have a prebuilt UI you can embed in a website, just like a youtube video.

Posted 5 times in the last 7 days, today it finally got 29 points with 0 comments? Weird.

Most announcements slip through without notice, it only picks up votes when it hits the main page.

v1 also took a while to make it to HN, v3 is a complete rewrite focused on extensibility with a lot more new features.


The few people looking at /new on HN are ridiculously overpowered. A few upvotes from them in the few hours will get you to the front page, and just 1-2 downvotes will make your post never see the light of day.

You can't downvote a post, so that's not a factor.

Also it's not as powerful as you think. In the past I have spent a lot of time looking at /new, and upvoting stories that I think should be surfaced. The vast majority of them still never hit near the front page.

It's a real shame, because some of the best and most relevant submissions don't seem to make it.


If you are in a company like e.g. ClickHouse and share a new HN Submission of ClickHouse via the internal Slack to #general, then you easily get enough upvotes for the front page.

You can absolutely downvote posts. You have to have a certain amount of karma before the option becomes available.

No I was wrong. You can't downvote posts. Flags are used instead, apparently.

Yes, and I will fully agree with you that flags are overpowered. That system does need to be re-worked IMHO.

freedomben has 28k karma. I don’t think the downvote button is coming.

What is stopping you from joining those "ridiculously overpowered people"?

Why wouldn't I be able to fix these things? If I managed to build a thing from scratch (with Opus 4.5), I don't see why I wouldn't be able to fix it and maintain it in the future (maybe with Opus 4.7 or even better future models?).


Which is exactly why whenever I have an idea I just tinked with ClaudeCode for an hour or so until I have exactly what I need. It takes less time than trying to compare 10 similar products, none of which have the exact specifications or features that I need.

List of projects mentioned before: https://news.ycombinator.com/item?id=46716805


Tens of small/one-time apps or scripts that I needed done, and Claude provided them in seconds.

A few medium to big projects:

- a scraper of product pricing in shops near me, to track inflation over time

- a clone of typeform, but more customized on my needs

- end-to-end automation of managing facebook ads campaign (create/track/scale)

- dashboard to automate managing comments on multiple facebook pages

- a classic polymarket bot

- a pdf editor inbrowser so all my data stays local

- a landing page generator for ecommerce, just give the product description

- a slideshow generator using nanobananapro

- an infinite canvas to work in to generate images, with nodes

- agent automations to test AI voice agents in calls

Anything that comes to mind I can setup and deploy in a few minutes.

Nothing groundbreaking but it's all stuff that I didn't know how to do before, and now I know how to build/maintain/backup/upgrade with ClaudeCode. I know most senior devs would say "well this was all doable before" but they forget that not everyone had all the necessary skills to do all this stuff. Now it's a one man job.


I am using Opus -> Native app flow to build the apps I need quickly. This is crazy how much I don't know, but still getting stuff done.


Are you doing offering services in this area ?


Sure, my email in my profile! Or you can find my linkedin from my username.


"my experience from 5 years of coding with AI" immediately disregarded the rest of TFA.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: