It's not meant as just a demo of what Rama can do. It's a fully featured tool that supports the end-to-end workflow of building and maintaining robust LLM agents. It has an easy-to-learn API and you don't need to learn how to program Rama itself.
Rama isn't open source, but it's far from a black box. All data structures and computation are fully visible in the UI. You can inspect depots, topologies, and PStates, and see exactly what's stored and how it changes over time. Everything is also accessible through the Rama client API for direct querying. The PState schemas used by Agent-o-rama are defined here: https://github.com/redplanetlabs/agent-o-rama/blob/master/sr...
Backups are easy: you configure a “backup provider” (we provide one for S3) and a schedule for incremental backups. The free version can also be backed up with a short maintenance window. Full details are here: https://redplanetlabs.com/docs/~/backups.html
I'm only somewhat familiar with Koog, but these these are major differences according to my understanding:
- Execution model: Koog is a library for defining agents that run within a single process. AOR agents execute across a distributed cluster, whether one node or thousands.
- Deployment and scaling: Koog provides no deployment or scaling mechanisms. That's something you need to figure out on your own. AOR includes built-in deployment, updating, and scaling.
- Integration complexity: Koog must be combined with other tools (monitoring tool, databases, deployment tools, etc.) to approximate a complete agent platform. AOR is fully integrated, including built-in high-performance, durable storage for any data model.
- Experimentation and evaluation: Koog has no features for experimentation or online evaluation. AOR includes extensive support for both.
- Scalability: AOR scales horizontally for both computation and storage. With Koog, you'd need to design and operate that infrastructure yourself.
- Observability: Koog's observability is limited to traces and basic telemetry exposed via OpenTelemetry. AOR provides a much broader set of telemetry, including "time to first token" and online evaluation charts. You can also split all time-series charts automatically by any metadata you attach to your runs (e.g. see how agent latency differs by the choice of model used). Plus, it's all built-in and automatic.
Please correct me if I'm wrong on any aspect of Koog.
Yes, Rama emerged from following this approach exactly. The "make it possible" phase was grinding for years on innumerable backend infrastructure problems. These included problems I worked on directly at BackType and Twitter, and also the thousands of use cases I helped with through my open-source work (especially Storm).
The "make it beautiful" part involved unifying all these use cases into a single set of abstractions that could express them all concisely, with high performance, and without needing any other infrastructure. Since I was building such a general platform, I was also able to consider use cases I hadn't directly worked on – basically just looking at popular web applications and their features.
Leaving Twitter in 2013 was the start of the "make it beautiful" phase. By that point I had already figured out the broad outlines of what such a next generation platform would look like (event sourcing + materialized views, indexing with data structures instead of data models). It was a long road from there to figure out the details and turn it into a production platform.
Thanks for explaining!
To me Rama looks so high level, that somebody feeling the pain of not having it should be launching new projects all the time, like a consultancy.
That's not in CPS form as cont returns into explode. Even if we treat the for loop in explode as an implementation detail, cont should take the next continuation as a parameter.
> For every invoke, Rama determines if it’s executing a deframaop or a deframafn . If it’s a deframafn , then it invokes it just like how functions are invoked in Clojure by unrolling the stack frame with the return value. [...]
So I think GP's Python example matches quite well with what Rama does in the Rama equivalent from TFA.
Functions in Rama do return like normal functions in Clojure. It's Rama "ops" that call the continuation, but "ops" clearly _can_ return back to the caller, as that's how the whole generator aspect of Rama works -- it couldn't be any other way. So `println` is a function and it returns to its caller.
This is very different from a Scheme that compiles to CPS such that every return is a continuation call. The point is that in such a Scheme you end up with all activation frames being on the heap and thus needing garbage collection. This is why delimited continuations were invented: Scheme-style call/cc continuations are just too expensive!
But there is another optimization one can do: some returns can just unwind, while others can call a continuation. This is the optimization that Rama is going for near as I can see. It's a lot like Icon's `return`, which unwinds (in the Icon-to-C compiler anyways) and Icon's `suspend`, which calls the current continuation (again, in the Icon-to-C compiler case). This way you can return when you're not generating results -- when you're returning the last value, or when pruning (in Icon that's failure, which also unwinds, like `return`).
I named it like this so there would be consistency between deframaop and deframafn. Shortening deframafn like you suggest would be "deffn" or "deffunction", which would be very confusing. And I'd rather have deframaop + deframafn than defop + deframafn.
Yes. The second Rama language will have a co-author and will lose the purity of the first language by adding many keywords each with a lot of emotional backstory.
Probably that future co-author will write all of it w/o any attempt to preserve the aesthetic of the original. Probably that future co-author should not just not write anything.
That would make it so you can't do "use" on com.rpl.rama. Since Rama is a full language, doing a "use" on the namespace is generally preferred as otherwise you would have to write "rama/" everywhere, which is irritating.
I also don't like overloading "defn" with something that's completely different. Also, a deframafn is more than a Clojure defn since it can emit to other output streams.
The Cont monad in Haskell is only for single continuation targets and can't do branching/unification like Rama. That kind of behavior doesn't seem like it would express naturally or efficiently with just "do".
Yes, Rama probably isn’t semantically comparable to one single monad.
I was talking about the do notation as a way to sugar the syntax of cps monadic operations into a flat, imperative syntax. This is exactly what Rama is doing.
If you look at a tutorial of what haskell do-notations desugar into, you’ll find the same cps stuff described in this article.
What we've released publicly on our public Maven repository is a different build of Rama which can only be used for testing/experimentation in a single process. So it can't be used to run full clusters. It's API-equivalent to the full Rama build.
This will change when we move out of private beta, when Rama will be free to use for production for small-scale applications.
Do you have an estimate of when you'll be out of private beta and available? Can you share any more about pricing or what you consider to be a "small-scale application". Thanks!
We're aiming to be out of private beta early next year. "Small-scale" basically means the kind of scale a single Postgres node + application server can handle.
Well, this article is to help people understand just Rama's dataflow API, as opposed to an introduction to Rama for backend development.
Rama does have a learning curve. If you think its API is "clunky", then you just haven't invested any time in learning and tinkering with it. Here are two examples of how elegant it is:
This one does atomic bank transfers with cross-partition transactions, as well as keeping track of everyone's activity:
This one does scalable time-series analytics, aggregating across multiple granularities and minimizing reads at query time by intelligently choosing buckets across multiple granularities:
This question is probably obvious if I knew what a microbatch or topology or depot was, but as a Rama outsider, is there a good high level mental model for what makes the cross-partition transactions work? From the comments that mention queuing and transaction order, is serialized isolation a good way to imagine what's going on behind the scenes or is that way off base?
A depot is a distributed log of events that you append to as a user. In this case, there's one depot for appending "deposits" (an increase to one user's account) and another depot for appending "transfers" (an attempt to move funds from one account to another).
A microbatch topology is a coordinated computation across the entire cluster. It reads a fixed amount of data from each partition of each depot and processes it all in batch. Changes don't become visible until all computation is finished across all partitions.
Additionally, a microbatch topology always starts computation with the PStates (the indexed views that are like databases) at the state of the last microbatch. This means a microbatch topology has exactly-once semantics – it may need to reprocess if there's a failure (like a node dying), but since it always starts from the same state the results are as if there were no failures at all.
Finally, all events on a partition execute in sequence. So when the code checks if the user has the required amount of funds for the transfer, there's no possibility of a concurrent deduction that would create a race condition that would invalidate the check.
So in this code, it first checks if the user has the required amount of funds. If so, it deducts that amount. This is safe because it's synchronous with the check. The code then changes to the partition storing the funds for the target user and adds that amount to their account. If they're receiving multiple transfers, those will be added one at a time because only one event runs at a time on a partition.
To summarize:
- Colocated computation and storage eliminates race conditions
- Microbatch topologies have exactly-once semantics due to starting computation at the exact same state every time regardless of failures or how much it progressed on the last attempt
Actually, we've eliminated a massive amount of the complexity of backend development. This is most pronounced at large scale, but it's true at small scale as well.
Our Twitter-scale Mastodon example is literally 100x less code than Twitter wrote to build the equivalent (just the consumer product), and it's 40% less code than the official Mastodon implementation (which isn't scalable). We're seeing similar code reduction from private beta users who have rewritten their applications on top of Rama.
Line of code is a flawed metric of course, but when it's reducing by such large amounts that says something. Being able to use the optimal data model for every one of your use cases, use your domain data directly, express fault-tolerant distributed computation with ease, and not have to engineer custom deployment routines has a massive effect on reducing complexity and code.
The original post makes so much more sense in this context! One of the "holy grails" in my mind is making CQRS and dataflow programming as easy to learn and maintain as existing imperative programming languages - and easy to weave into real-time UX.
There are so many backend endpoints in the wild that do a bunch of things in a loop, many of which will require I/O or calls to slow external endpoints, transform the results with arbitrary code, and need to return the result to the original requestor. How do you do that in a minimal number of readable lines? Right now, the easiest answer is to give up on trying to do this in dataflow, define a function in an imperative programming language, maybe have it do some things locally in parallel with green threads (Node.js does this inherently, and Python+gevent makes this quite fluent as well), and by the end of that function you have the context of the original request as well as the results of your queries.
But there's a duality between "request my feed" and "materialize/cache the most complex/common feeds" that's not taken into account here. The fact that the request was made is a thing that should kick off a set of updates to views, not necessarily on the same machine, that can then be re-correlated with the request. And to do that, you need a way of declaring a pipeline and tracking context through that pipeline.
https://materialize.com is a really interesting approach here, letting you describe all of this in SQL as a pipeline of materialized views that update in real time, and compiling that into dataflow. But most programmers don't naturally describe this kind of business logic in SQL.
Rama's CPS assignment syntax is really cool in this context. I do wish we could go beyond "this unlocks an entire paradigm to people who know Clojure" towards "this unlocks an entire paradigm to people who only know Javascript/Python" - but it's a massive step in the right direction!
Rama isn't open source, but it's far from a black box. All data structures and computation are fully visible in the UI. You can inspect depots, topologies, and PStates, and see exactly what's stored and how it changes over time. Everything is also accessible through the Rama client API for direct querying. The PState schemas used by Agent-o-rama are defined here: https://github.com/redplanetlabs/agent-o-rama/blob/master/sr...
Backups are easy: you configure a “backup provider” (we provide one for S3) and a schedule for incremental backups. The free version can also be backed up with a short maintenance window. Full details are here: https://redplanetlabs.com/docs/~/backups.html