Hacker Newsnew | past | comments | ask | show | jobs | submit | tracyhenry's commentslogin

I'm honestly surprised by so much hate. IMHO it's more important to look at 1) the progress we've made + what this can potentially do in 5 years and 2) how much it's already helping people write code than dismissing it based on its current state.


This looks great. I would love to know more what makes Confident AI/DeepEval special compared to tons of other LLM Eval tools out there.


Thanks and great question! There's a ton of eval tools out there but there are only a few that actually focuses on evals. The quality of LLM evaluation depends on the quality of dataset and the quality of metrics, and so tools that are more focused on the platform side of things (observability/tracing) tend to fall short on the ability to do accurate and reliable benchmarking. What tends to happen for those tools are users use them for one-off debugging, but when errors only happen 1% of the time, there is no capability for regression testing.

Since we own the metrics and the algorithms that we've spent the last year iterating on with our users, we balance between giving engineers the ability to customize our metric algorithms and evaluation techniques, while offering the ability for them to bring it to the cloud for their organization when they're ready.

This brings me to the tools that does have their own metrics and evals. Including us, there's only 3 companies out there that does this to a good extent (excuse me for this one), and we're the only one with a self-served platform such that any open-source user can get the benefit of Confident AI as well.

That's not all the difference, because if you were to compare DeepEval's metrics on more nuance details (which I think is very important), we provide the most customizable metrics out there. This includes researched-backed SOTA LLM-as-a-judge G-Eval for any criteria, and the recently released DAG metric that is a decision-based that is virtually deterministic despite being LLM-evaluated. This means as user's use cases get more and more specific, they can stick with our metrics and benefit from DeepEval's ecosystem as well (metric caching, cost tracking, parallelization, integrated with Pytest for CI/CD, Confident AI, etc)

There's so much more, such as generating synthetic data to get started with testing even if you don't have a prepared test set, red-teaming for safety testing (so not just testing for functionality), but I'm going to stop here for now.


+1

I think one rule around Show HN is that you allow people to see content without signing up, let alone paying for it. So this is a violation of that rule.

Edit: actually search is not behind the paywall (although that's not very obvious)


You can currently search the job listings without having to sign up, which only hides the company and the link the job listings.


That’s definitely violating the spirit of the rule


Right, the data is worth nothing without additional info.


Also, there's no way to delete your account, or remove your email. Which is not just frustrating -- it's flatly against GDPR, CCPA, etc etc.

I relatively recently was looking for a new job, so I was curious to see how well my own search process compared to this. Then I was mildly annoyed to discover that they require email signup to actually view any details about the company or the job. So I give them my email. And then I was even more disappointed to discover that they require you to pay money to even see the link to a single actual job posting. Sorry, not gonna do that if I'm just trying to scratch the curiosity itch. So then I go to delete my account, and... nothing. No can do.

Honestly one of the quickest turnarounds from "oh, neat" to "jeez, what a disappointment" that I've had in recent times.


I made a point to collect almost no information (only email) from the user (as opposed to Linkedin which asks for all sorts of data to sell) but happy to delete your account if you just email support. I also dislike having my data used or sold. That is not the purpose here, the feature to delete your account is just not yet there, appologies


UPDATE: You can now delete your account (under the 'My Account' tab within the /settings page)


couldn't take this article seriously considering it doesn't mention Meta's Rayban smart glasses, which largely does what they want already: a pair of glasses without visuals but with AI in your ears


Yeah at this point it's almost jumping on a new hype bandwagon to "come up" with the idea of ambient audio based AI.

The kicker here though is since it's all driven by a phone in your pocket (a) it will either kill your battery or not be allowed at all by the platform, and (b) it has no camera, so it has no idea what you are actually seeing or looking at so it will be a second class citizen to all the versions of this that are camera enabled (such as, as you mention, the RayBans).


This is a pretty common view for Vision Pro users. Hand tracking is great for these. Can't imagine having to use a controller.

Ofc Vision Pro users are extreme minorities so you are not wrong. But I highly encourage you to try out Vision Pro if you haven't.


I did buy a Vision Pro, but it's a nearly unusable device and outside of fora, I've never met anyone whose had a positive experience, so I suspect even among Vision Pro users, it's a minority opinion.

Hand tracking is not a feasible input method for routine computing.


This has been a great idea for decades. I want Haystack to be successful just like many other attempts. The early execution seems promising. And I suspect there will be many challenges (e.g. when it's hard to figure out caller/callee,, inconsistent UX preferences across developers, etc). Kudos for taking this on!

Btw I've always thought that this is even more powerful when the screen estate is more infinite than a 2D screen (like in a VR headset).


I love the idea of a Haystack VR world! It's a shame that VR software is in a tenuous state due to the biological factors, but I believe it's the future "one day".


"In a tenuous state due to the biological factors" is easily the funniest SV euphemism for "can't be used by humans."


It’s ok after they deliver the MVP there will be a wetware update.


Who knows, one day it may be possible (hopefully without any dystopian updates to human biology)!


At this point, that seems dubious - your inner ear is going to go all inner ear on you, no matter what. Unless you get to turn that off, VR is not it.


I hope our code-editor-in-VR wouldn't involve flying around like hilarious depictions of "The Gibson" in bad sci-fi movies.


Doesn't matter. As long as you use VR to display a virtual 3D environment and you move within it, your inner ear will fight with your vision system if you're moving or not. If the visual system and the accelerometer don't agree, the positioning system throws an exception.

And, for whatever reason, the human exception handler for that problem is firmly linked to the barf() subroutine ;)


Like I said, as long as you're not flying around- moving within it, then your inner ear doesn't care. Turning your head doesn't count. I don't see a need in a code-in-VR system to move like that. And most VR games solve this my having your teleport instead of translate.

I think the barf routine is because when your brain senses your vestibular system not working it thinks "oops I must be poisoned" and tries to make you throw up.


"Fixing our eyeballs is just an engineering problem."


Check out SoftSpace https://soft.space … not an IDE but similar idea for knowledge mapping


This is actually really cool!


please post a photo!


I was out last night, but here you go: https://ibb.co/68hPjRT


Thank you, this looks amazing


That’s deep red! Quite a different sight!


Llama 3 is the open source model while this is a product offering competing w/ ChatGPT, Claude and Gemini. Not a dupe imo.


not sure what you're talking about, this is just the playground/demo site for the new Llama 3. The discussion is over there.


+1 really well made. I wonder if there's a framework for making these kind of websites?


found in the sources that babylon.js (https://www.babylonjs.com/) is in use.


BabylonJS indeed. We had a tutorial up on our previous YouTube channel, if there's interest, we'll reupload it.


I’d love to see it


We'll upload it here today or tomorrow: https://www.youtube.com/channel/UCXF--ktsN0t97W9R0GN6_3Q


looking forward to it!



Awesome, thank you very much for the reupload!


thank you so much


My hot take:

ALL AI wearable companies should return their money to investors, and wait for the AR glasses by Apple.

AR glasses are the ultimate form factor. And as much as I hate monopoly, Apple has the right app/dev ecosystem, and will make Siri work.


Vision Pro isn't really supposed to be used outside.


I'm talking about AR glasses, not Vision Pro.


Do we know an ETA on those?


The fact the Vision Pro is passthrough VR instead of an AR screen on glass (as in when the battery is dead you see black, not see the room with no AR) says that it's far away.


Or while moving.


Absolutely correct.

The AR glasses race is between Apple and Meta Platforms.

This 'LLM-first phone' looks like a solution in search of a problem. But we'll see.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: