If don't keep old code around, you are forced to either drop pending requests that started on them, or try and replay them on the updated code, which can't be known to be safe. So it's better to keep it around, but this brings in a new set of infra and security challenges, eg where will it run, what will it cost, will it have a vulnerable dependency, etc
if you replay the request on the new version then you might encounter new steps that don't match what you have in the journal. temporal users know this pain well...
Rolling a task over to a new version should be "safe" in that you can detect conflicts and roll back if the sequence of calls does not match the old version.
For a post about "solving" durable execution I would expect both a scale-to-zero way to keep older versions around indeterminately - I guess the Lambda based approach does qualify - and a safe and controlled way to upgrade task versions iff the execution history is compatible.
Each execution by design has a record of all calls with side-effects, with input and output.
If you replay history up to the newest call and all calls are identical, that specific execution instance is compatible with the new code and can be upgraded. If not it should be rolled back, and you can either deploy a fixed version of the code with backwards compatibility, or delete executions that can not be upgraded.
Backwards compatible code can be written as
if (workflowVersion() >= FIX_VERSION) new_way() else old_way()
There should be two ways to get the version for backwards compatibility: workflowVersion() is replayed and can change between side effect calls, e.g executions will use the old retry logic until they reach the current point in time, when they will switch over to the new one.
originalWorkflowVersion() is constant, e.g. all executions that started before NEW_TAX_RULE will keep using the old tax rules for all calculations.
I'd love a deep technical comparison, but it would also be great to understand if Restate is better than Temporal for specific use cases and vice versa. When someone should choose one of them over another
This is really cool, nice work! How does this differ from Apollo Federation? I'm a bit confused here, because you have integrations with AF too - is this competitive, or is this a more vertically-integrated solution? Cheers!
Grafbase Federated Graphs are spec compliant with Apollo Federation. We've invested a lot in the developer experience of building an deploying GraphQL APIs. Local development, the Grafbase SDK and the Grafbase dashboard were built from the ground up to be easy and efficient to use.
GraphQL APIs deployed to Grafbase are also deployed to the edge by default, but can now also be deployed in your own infrastructure.
Thanks - what is your sweet spot use-case do you think? I've built a few GraphQL projects in the past, so I'm curious where you fit into the ecosystem?
This is really cool! Am I right in thinking that the cost for running this program is equivalent to all of the dependency execution durations? i.e no busy waiting?
To a first approximation, yes. There is some small cost for the workflow function itself, but as this doesn't wait on responses and only really executes the side effects, it is not that much. Especially, given that this has comparable semantics to the accurate mode (not the express mode) of StepFunctions (which is charged by number of state transitions and not super cheap).
yep no busy waiting for calls and sleeps within the system. However, 'sideEffects' where you basically just commit the result of an external operation still block the Lambda. But we're thinking about exposing a `fetch` api directly from the runtime (ie, we do a call on your behalf) that could in theory sort that out
I'm a main contributor of Tanuki (formerly MonkeyPatch).
The purpose of Tanuki is to reduce the time to ship your LLM projects, so you can focus on building what your users want instead of MLOps.
You define patched functions in Python using a decorator, and the execution of the function is delegated to an LLM, with type-coercion on the response.
Automatic distillation is performed in the background, which can reduce the cost and latency of your functions by up to 10x without compromising accuracy.
The real magic feature, however, is how you can implement alignment-as-code, in which you can use Python's `assert` syntax to declare the desired behaviour of your LLM functions. As this is managed in code, and is subject to code-review and the standard software-lifecycle, it becomes much clearer to understand how an LLM feature is meant to behave.
Good to know, we'll make it more clear in the docs!
To answer regarding these 2 areas,
1) The data for finetuning currently is saved on disk for low latency reading and writing. Both test statements and datapoints from the function execution are saved to the dataset. We also are aware that saving to disk is not the best option and limits many use-cases so we're currently working on creating persistence layers to allow communication with S3 / Redis / Cloudflare as the external data storage.
2) Currently starting the fine-tuning job happens after the dataset has at least 200 datapoints from GPT-4 executions and align statements. Once the finetuning is completed, the execution model for the function is automatically switched to the finetuned GPT 3.5 turbo model. Whenever the finetuned model breaks the constraints, the teacher (GPT4) is called upon to fix the datapoint and this datapoint will be saved back to the dataset for future iterative finetuning and improvements. We are also working on adding in ways for the user to include a "test-set" which could be used to evaluate if the finetuned model achieves the required performance before switching it as the primary executor of the function
Hope this makes it more clear, if you have any additional questions, let me know!
The IDEs shouldn't complain if the function has a docstring (which all the MP functions should have as that's the instruction that is executed) and the @patch decorator, atleast the ones we have tried it with have liked the syntax in that sense so far. But adding a "pass" is also permissible if the IDE does complain
The big one is a Typescript implementation. Other than that, the plan is to support other models (e.g Llama) that can be fine-tuned.
Finally, other persistence layers like S3 and Redis, to support running on execution targets (like AWS Lambda and CloudFlare workers) that don’t have persistent storage.
I think it could be really interesting to support Vercel more tightly too. We currently support Vercel with Python, but I think Typescript + Redis would really enable serverless AI functions - which is where I think this project should go!
I understand the point. I would ideally like an association with monkey-patching something as that is relevant to the behaviour of the package. However, not so similar that it shadows the technique of monkey-patching!