Well, I can understand your frustration, but this is basically a pipeline that takes some code written by an LLM and attempts to prove its correctness using the Lean 4 theorem prover.
I'm sorry if it was unclear. My comment was just a validation of your assertion. It does use Claude - or any openai compatible LLM - to perform the extraction.
I wanted to ask almost this question, then saw that it is on #1 right now.
My use case is ssh. I would like to stick my private key into a local Docker container, have a ssh-identical cli that reverse proxies into the container, and have some rules about what ssh commands the container may proxy or not.
I am actually a research engineer paid by the French government. They take digital sovereignty pretty serious over here, which is sometimes good, sometimes less so.
Definitely the right call on Windows, though. Even my parents (in their mid-seventies) moved to Linux this year.
I am a counter example of that take. As a French citizen, I have spent a decade building an open alternative Dropbox [1] that is I believe miles ahead of even Dropbox itself. In practice, France represents about 1% of the customer base. I've tried reaching out to the people who talk loud about sovereignty. Turns out it's just something they say at conferences to entertain each other as they have no power to actually make it happen.
This is the first time I hear of property-based testing, and I am intrigued. What is the difference between this and a sufficiently expressive structural type system?
Given some system under test (SUT) with inputs (T, G...) and expected outputs derived from the inputs T', G' etc., a property based testing framework attempts to exercise the entire domain of values assignable to T, G, etc. against the SUT.
The generators of T, typically called an Arbitrary<T> can be constrained, e.g Arbitrary<number> could generate any float or just non-negative integers. Ideally we would define an Arbitrary<float64> or Arbitrary<u64>.
A sufficiently expressive type system makes it easier to narrow the gap between what is generated and what is assignable to type T, making it clearer what the valid input domain to the SUT is.
I am sorry, I am not a real computer scientist and I find it difficult to find the right term.
With "sufficiently expressive", I mean things like dependent types and refinement types, that can express the constraint on a unit vector.
It seems to me that this is more or less the same thing, but Monte Carlo. Like MCMC vs symbolic Bayesian inference.
After the first run, you have a script and an API: the agent discovery mechanism is a detail. If the script is small enough, and the task custom enough, you could simply add the script to the context and say "use this, adapt if needed".
Natural language is ambiguous. If both input and output are in a formal language, then determinism is great. Otherwise, I would prefer confidence intervals.