This feels completely speculative: there's no measure of whether this approach is actually effective.
Personally, I'm skeptical:
- Having the agent look up the JSON schemas and skills to use the CLI still dumps a lot of tokens into its context.
- Designing for AI agents over humans doesn't seem very future proof. Much of the world is still designed for humans, so the developers of agents are incentivized to make agents increasingly tolerate human design.
- This design is novel and may be fairly unfamiliar in the LLM's training data, so I'd imagine the agent would spend more tokens figuring this CLI out compared to a more traditional, human-centered CLI.
Yeah, people seem to forget one of the L's in LLM stands for Language, and human language is likely the largest chunk in training data.
A cli that is well designed for humans is well designed for agents too. The only difference is that you shouldn't dump pages of content that can pollute context needlessly. But then again, you probably shouldn't be dumping pages of content for humans either.
It's not obvious that human language is or should be the largest amount of training data. It's much easier to generate training data from computers than from humans, and having more training data is very valuable. In paticular, for example, one could imagine creating a vast number of debugging problems, with logs and associated command outputs, and training on them.
Claude will load the name and description of each enabled skill into context at startup[0]; the LLM needs to know what it can invoke, after all. It's negligible for a few skills, but a hundred skills will likely have some impact, e.g. deemphasizing other skills by adding noise.
Personally, I'm skeptical:
- Having the agent look up the JSON schemas and skills to use the CLI still dumps a lot of tokens into its context.
- Designing for AI agents over humans doesn't seem very future proof. Much of the world is still designed for humans, so the developers of agents are incentivized to make agents increasingly tolerate human design.
- This design is novel and may be fairly unfamiliar in the LLM's training data, so I'd imagine the agent would spend more tokens figuring this CLI out compared to a more traditional, human-centered CLI.