Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Can anyone direct me towards how to ... make one?

https://hamel.dev/blog/posts/evals/

> What are "obvious" things that are important to get right - temperature set to 0? At least ~10 or 20 attempts at the same problem for each llm?

LLMs are actually pretty deterministic, so there is no need to do more than one attempt with the exact same data.

> Finally, any known/commonly used frameworks to do this, or any tooling that can call different LLMs would be enough?

https://github.com/vercel/ai

https://github.com/mattpocock/evalite



"LLMs are actually pretty deterministic, so there is no need to do more than one attempt with the exact same data."

Is this true? I remember there being a randomization factor in weighing tokens to make the output more something, dont recall what

Obviously I'm not an Ai dev


In my experience, the response may not be exactly the same, but the difference is negligible.


I'm very grateful! Thanks a lot




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: