Good question, we’ll add some info to the page for this.
LLMs are generally quite good at writing code, so attaching a Python REPL gives them extra abilities. For example, I was able to use a version with boto3 to answer questions about an AWS cluster that took multiple API calls.
LLMs are also good at using a code execution environment for data analysis.
It seems like a very similar issue arises with the "natural language query" problem for database systems. My best guess at a solution in that domain is to restrict the interface. Allow the LLM to generate whatever SQL it wants, but parse that SQL with a restricted grammar that only allows a "safe" (e.g. non-mutating) subset of SQL before actually issuing queries to the database. Then figure out (somehow) how to close the loop on error handling when the LLM violates the contract (e.g. generates a query which doesn't parse).
Then of course there's the whole UX problem of even when you restrict the interface to safe queries, the LLM may still generate queries which are completely incorrect. The best idea I can come up with there is to dump the query text to an editor where the user can review it for correctness.
So it's not really "natural language queries" more like "natural language SQL generation" which is a completely different thing and absolutely should not be marketed as the former.
People bring up this concept as a way to make systems "more friendly to novice users" which tbh makes me a little uncomfortable, because it seems like just a huge footgun. I'd rather have novice users struggle a bit and become less novice, than to encourage them to run and implicitly trust queries which are likely incorrect.
So it's a bit difficult to tell how much value is added here over some basic intellisense style autocomplete.
Looking to the world of "real tools" like hammers and saws, we don't see "novice hammers" or "novice saws". The tool is the tool, and your skill using it grows as you use it more. It seems like a bit of a boondoggle to try to guess what might be good for a novice and orient your entire product experience around that, rather than simply making a tool that's good for experts doing real work and trusting that the novices will put in the effort to build expertise.
Only if you give it unfettered accesss. AWS has an API called AssumeRole which can generate short-lived credentials with a specifically scoped set of permissions, which I use instead.
LLMs are generally quite good at writing code, so attaching a Python REPL gives them extra abilities. For example, I was able to use a version with boto3 to answer questions about an AWS cluster that took multiple API calls.
LLMs are also good at using a code execution environment for data analysis.