More

tracyhenry · 2025-05-20T11:36:04 1747740964

I'm honestly surprised by so much hate. IMHO it's more important to look at 1) the progress we've made + what this can potentially do in 5 years and 2) how much it's already helping people write code than dismissing it based on its current state.

tracyhenry · 2025-02-20T19:17:46 1740079066

This looks great. I would love to know more what makes Confident AI/DeepEval special compared to tons of other LLM Eval tools out there.

jeffreyip · 2025-02-20T19:45:00 1740080700

Thanks and great question! There's a ton of eval tools out there but there are only a few that actually focuses on evals. The quality of LLM evaluation depends on the quality of dataset and the quality of metrics, and so tools that are more focused on the platform side of things (observability/tracing) tend to fall short on the ability to do accurate and reliable benchmarking. What tends to happen for those tools are users use them for one-off debugging, but when errors only happen 1% of the time, there is no capability for regression testing.

Since we own the metrics and the algorithms that we've spent the last year iterating on with our users, we balance between giving engineers the ability to customize our metric algorithms and evaluation techniques, while offering the ability for them to bring it to the cloud for their organization when they're ready.

This brings me to the tools that does have their own metrics and evals. Including us, there's only 3 companies out there that does this to a good extent (excuse me for this one), and we're the only one with a self-served platform such that any open-source user can get the benefit of Confident AI as well.

That's not all the difference, because if you were to compare DeepEval's metrics on more nuance details (which I think is very important), we provide the most customizable metrics out there. This includes researched-backed SOTA LLM-as-a-judge G-Eval for any criteria, and the recently released DAG metric that is a decision-based that is virtually deterministic despite being LLM-evaluated. This means as user's use cases get more and more specific, they can stick with our metrics and benefit from DeepEval's ecosystem as well (metric caching, cost tracking, parallelization, integrated with Pytest for CI/CD, Confident AI, etc)

There's so much more, such as generating synthetic data to get started with testing even if you don't have a prepared test set, red-teaming for safety testing (so not just testing for functionality), but I'm going to stop here for now.

tracyhenry · on Dec 7, 2024

+1

I think one rule around Show HN is that you allow people to see content without signing up, let alone paying for it. So this is a violation of that rule.

Edit: actually search is not behind the paywall (although that's not very obvious)

Jabbs · on Dec 7, 2024

You can currently search the job listings without having to sign up, which only hides the company and the link the job listings.

chipgap98 · on Dec 7, 2024

That’s definitely violating the spirit of the rule

ddgflorida · on Dec 7, 2024

Right, the data is worth nothing without additional info.

nbadg · on Dec 7, 2024

Also, there's no way to delete your account, or remove your email. Which is not just frustrating -- it's flatly against GDPR, CCPA, etc etc.

I relatively recently was looking for a new job, so I was curious to see how well my own search process compared to this. Then I was mildly annoyed to discover that they require email signup to actually view any details about the company or the job. So I give them my email. And then I was even more disappointed to discover that they require you to pay money to even see the link to a single actual job posting. Sorry, not gonna do that if I'm just trying to scratch the curiosity itch. So then I go to delete my account, and... nothing. No can do.

Honestly one of the quickest turnarounds from "oh, neat" to "jeez, what a disappointment" that I've had in recent times.

Jabbs · on Dec 7, 2024

I made a point to collect almost no information (only email) from the user (as opposed to Linkedin which asks for all sorts of data to sell) but happy to delete your account if you just email support. I also dislike having my data used or sold. That is not the purpose here, the feature to delete your account is just not yet there, appologies

Jabbs · on Dec 9, 2024

UPDATE: You can now delete your account (under the 'My Account' tab within the /settings page)

tracyhenry · on Nov 9, 2024

couldn't take this article seriously considering it doesn't mention Meta's Rayban smart glasses, which largely does what they want already: a pair of glasses without visuals but with AI in your ears

zmmmmm · on Nov 9, 2024

Yeah at this point it's almost jumping on a new hype bandwagon to "come up" with the idea of ambient audio based AI.

The kicker here though is since it's all driven by a phone in your pocket (a) it will either kill your battery or not be allowed at all by the platform, and (b) it has no camera, so it has no idea what you are actually seeing or looking at so it will be a second class citizen to all the versions of this that are camera enabled (such as, as you mention, the RayBans).

tracyhenry · on Sept 26, 2024

This is a pretty common view for Vision Pro users. Hand tracking is great for these. Can't imagine having to use a controller.

Ofc Vision Pro users are extreme minorities so you are not wrong. But I highly encourage you to try out Vision Pro if you haven't.

closewith · on Sept 26, 2024

I did buy a Vision Pro, but it's a nearly unusable device and outside of fora, I've never met anyone whose had a positive experience, so I suspect even among Vision Pro users, it's a minority opinion.

Hand tracking is not a feasible input method for routine computing.

tracyhenry · on Sept 25, 2024

This has been a great idea for decades. I want Haystack to be successful just like many other attempts. The early execution seems promising. And I suspect there will be many challenges (e.g. when it's hard to figure out caller/callee,, inconsistent UX preferences across developers, etc). Kudos for taking this on!

Btw I've always thought that this is even more powerful when the screen estate is more infinite than a 2D screen (like in a VR headset).

akshaysg · on Sept 25, 2024

I love the idea of a Haystack VR world! It's a shame that VR software is in a tenuous state due to the biological factors, but I believe it's the future "one day".

dangerlibrary · on Sept 25, 2024

"In a tenuous state due to the biological factors" is easily the funniest SV euphemism for "can't be used by humans."

IgorPartola · on Sept 25, 2024

It’s ok after they deliver the MVP there will be a wetware update.

akshaysg · on Sept 25, 2024

Who knows, one day it may be possible (hopefully without any dystopian updates to human biology)!

groby_b · on Sept 26, 2024

At this point, that seems dubious - your inner ear is going to go all inner ear on you, no matter what. Unless you get to turn that off, VR is not it.

cdchn · on Sept 26, 2024

I hope our code-editor-in-VR wouldn't involve flying around like hilarious depictions of "The Gibson" in bad sci-fi movies.

groby_b · on Sept 26, 2024

Doesn't matter. As long as you use VR to display a virtual 3D environment and you move within it, your inner ear will fight with your vision system if you're moving or not. If the visual system and the accelerometer don't agree, the positioning system throws an exception.

And, for whatever reason, the human exception handler for that problem is firmly linked to the barf() subroutine ;)

cdchn · on Sept 27, 2024

Like I said, as long as you're not flying around- moving within it, then your inner ear doesn't care. Turning your head doesn't count. I don't see a need in a code-in-VR system to move like that. And most VR games solve this my having your teleport instead of translate.

I think the barf routine is because when your brain senses your vestibular system not working it thinks "oops I must be poisoned" and tries to make you throw up.

cdchn · on Sept 26, 2024

"Fixing our eyeballs is just an engineering problem."

doctorhandshake · on Sept 25, 2024

Check out SoftSpace https://soft.space … not an IDE but similar idea for knowledge mapping

akshaysg · on Sept 25, 2024

This is actually really cool!

tracyhenry · on May 11, 2024

please post a photo!

pelagicAustral · on May 11, 2024

I was out last night, but here you go: https://ibb.co/68hPjRT

tracyhenry · on May 12, 2024

Thank you, this looks amazing

jiehong · on May 11, 2024

That’s deep red! Quite a different sight!

tracyhenry · on April 18, 2024

Llama 3 is the open source model while this is a product offering competing w/ ChatGPT, Claude and Gemini. Not a dupe imo.

ChrisArchitect · on April 18, 2024

not sure what you're talking about, this is just the playground/demo site for the new Llama 3. The discussion is over there.

tracyhenry · on April 13, 2024

+1 really well made. I wonder if there's a framework for making these kind of websites?

maxdaten · on April 13, 2024

found in the sources that babylon.js (https://www.babylonjs.com/) is in use.

janosd · on April 13, 2024

BabylonJS indeed. We had a tutorial up on our previous YouTube channel, if there's interest, we'll reupload it.

Atotalnoob · on April 13, 2024

I’d love to see it

janosd · on April 13, 2024

We'll upload it here today or tomorrow: https://www.youtube.com/channel/UCXF--ktsN0t97W9R0GN6_3Q

tracyhenry · on April 13, 2024

looking forward to it!

janosd · on April 13, 2024

It's up: https://youtu.be/Qw22utZ8IlY

maxdaten · on April 13, 2024

Awesome, thank you very much for the reupload!

tracyhenry · on April 13, 2024

thank you so much

tracyhenry · on Jan 9, 2024

My hot take:

ALL AI wearable companies should return their money to investors, and wait for the AR glasses by Apple.

AR glasses are the ultimate form factor. And as much as I hate monopoly, Apple has the right app/dev ecosystem, and will make Siri work.

terafo · on Jan 9, 2024

Vision Pro isn't really supposed to be used outside.

tracyhenry · on Jan 9, 2024

I'm talking about AR glasses, not Vision Pro.

daveguy · on Jan 9, 2024

Do we know an ETA on those?

whywhywhywhy · on Jan 11, 2024

The fact the Vision Pro is passthrough VR instead of an AR screen on glass (as in when the battery is dead you see black, not see the room with no AR) says that it's far away.

dweekly · on Jan 9, 2024

Or while moving.

rvz · on Jan 9, 2024

Absolutely correct.

The AR glasses race is between Apple and Meta Platforms.

This 'LLM-first phone' looks like a solution in search of a problem. But we'll see.