Even if it is 10x cheaper and 2x worse it's going to eat up even more tokens spinning its wheels trying to implement things or squash bugs and you may end up spending more because of that. Or at least spending way more of your time.
"Chat UI" can "feel" a bit thin from an eng/product when you initially think about, and that's something we've had to grapple with over time. As we've dug deeper, my worry about that has gone down over time.
For most people, the chat is the entrypoint to LLMs, and people are growing to expect more and more. So now it might be basic chat, web search, internal RAG, deep research, etc. Very soon, it will be more complex flows kicked off via this interface (e.g. cleaning up a Linear project). The same "chat UI" that is used for basic chat must (imo) support these flows to stay competitive.
On the engineering side, things like Deep Research are quite complex/open-ended, and there can be huge differences in quality between implementations (e.g. ChatGPTs vs Claude). Code interpreter as well (to do it securely) is quite a tricky task.
My understanding of YC is that they place more emphasis on the founders than the initial idea, and teams often pivot.
That being said, I think there is an opportunity for them to discover and serve an important enterprise use case as AI in enterprise hits exponential growth.
There are many markets (Europe), and highly regulated industries with air-gapped deployments where the typical players (ChatGPT, MS Copilot) in the field are having a hard time.
On another axis, if you are able to offer BYOK deployments and the customers have huge staff with low usage, it's pretty easy to compete with the big players due to their high per-seat pricing.
There are also many teams we work with that want to (1) retain model flexibility and (2) give everyone at the company the best model for the job. Every week? a model from a different provider comes out that is better at some tasks than anyone else. It's not great to be locked out from using that model since you're a "ChatGPT" company.
Agree that's a lot of other projects out there, but why do you say the Vercel option is more advanced/mature?
The common trend we've seen is that most of these other projects are okay for a true "just send messages to an AI and get responses" use case, but for most things beyond that they fall short / there a lot of paper cuts.
For an individual, this might show up when they try more complex tasks that require multiple tool calls in sequence or when they have a research task to accomplish. For an org, this might show up when trying to manage access to assistants / tools / connected sources.
Our goal is to make sure Onyx is the most advanced and mature option out there. I think we've accomplished that, so if there's anything missing I'd love to hear about it.
Alright let's say im tasked with building a fancy AI-powered research assistant and I need onyx or Vercel's ai-chatbot sdk. Why would I reach for onyx?
I have used vercel for several projects and I'm not tied to it, but would like to understand how onyx is comparable.
Benefits for my use cases for using vercel have been ease of installation, streaming support, model agnosticity, chat persistence and blob support. I definitely don't like the vendor lock in, though.
Not wanting to use Vercel is honestly a good enough reason. If you’re a heavy Vercel user you probably aren’t their target market since they’re aiming at enterprise types from what it looks like.
I wasn't trying to be a hater, i think it is great they got funded for this. It just felt like there are so many free options and alternatives out there that are addressing basically the same things (and look almost exactly the same) it genuinely surprised me.
Alexa skills are 3rd party add-ons/plugins. Want to control your hue lights? add the phillips hue skill. I think claude skills in an alexa world would be like having to seed alexa with a bunch of context for it to remember how to turn my lights on and off or it will randomly attempt a bunch of incorrect ways of doing it until it gets lucky.
IMHO, don't, don't keep up. Just like "best practices in prompt engineering", these are just temporary workaround for current limitations, and they're bound to disappear quickly. Unless you really need the extra performance right now, just wait until models get you this performance out of the box instead of investing into learning something that'll be obsolete in months.
I agree with your conclusion not to sweat all these features too much, but only because they're not hard at all to understand on demand once you realize that they all boil down to a small handful of ways to manipulate model context.
But context engineering very much not going anywhere as a discipline. Bigger and better models will by no means make it obsolete. In fact, raw model capability is pretty clearly leveling off into the top of an S-curve, and most real-world performance gains over the last year have been precisely because of innovations on how to better leverage context.
My point is that there'll be some layer doing that for you. We already have LLMs writing plans for another LLM to execute, and many other such orchestrations, to reduce the constraints on the actual human input. Those implementing this layer need to develop this context engineering; those simply using LLM-based products do not, as it'll be done for them somewhat transparently, eventually. Similar to how not every software engineer needs to be a compiler expert to run a program.
I agree with this take. Models and the tooling around them are both in flux. I d rather not spend time learning something in detail for these companies to then pull the plug chasing next-big-thing.
Well, have some understanding: the good folks need to produce something, since their main product is not delivering the much yearned for era of joblessness yet. It's not for you, it's signalling their investors - see, we're not burning your cash paying a bunch of PhDs to tweak the model weights without visible results. We are actually building products. With a huge and willing A/B testing base.
Agree — it's a big downside as a user to have more and more of these provider-specific features. More to learn, more to configure, more to get locked into.
Of course this is why the model providers keep shipping new ones; without them their product is a commodity.
If I were to say "Claude Skills can be seen as a particular productization of a system prompt" would I be wrong?
From a technical perspective, it seems like unnecessary complexity in a way. Of course I recognize there are lot of product decisions that seem to layer on 'unnecessary' abstractions but still have utility.
In terms of connecting with customers, it seems sensible, under the assumption that Anthropic is triaging customer feedback well and leading to where they want to go (even if they don't know it yet).
Update: a sibling comment just wrote something quite similar: "All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs." I think I agree.
All these things are designed to create lock in for companies. They don’t really fundamentally add to the functionality of LLMs. Devs should focus on working directly with model generate apis and not using all the decoration.
Me? I love some lock in. Give me the coolest stuff and I'll be your customer forever. I do not care about trying to be my own AI company. I'd feel the same about OpenAI if they got me first... but they didn't. I am team Anthropic.
Joking aside, I ask Claude how to uses Claude... all the time! Sometimes I ask ChatGTP about Claude. It actually doesn't work well because they don't imbue these AI tools with any special knowledge about how they work, they seem to rely on public documentation which usually lags behind the breakneck pace of these feature-releases.
"Recursion" is a word that shows up a lot in the rants of people in AI psychosis (believe they turned the chatbot into god, or believe the chatbot revealed themselves to be god.)
Thats the start of the singularity. The changes will keep accelerating and less and less people will be able to keep up until only the AIs themselves know how to use.
I don’t think these are things to keep up with. Those would be actual fundamental advances in the transformer architecture and core elements around it.
This stuff is like front end devs building fad add-ons which call into those core elements and falsely market themselves as fundamental advancements.
Yes and as we rely on AI to help us choose our tools... the phenomena feels very different, don't you think? Human thinking, writing, talking, etc is becoming less important in this feedback loop seems to me.