I haven’t used it, but from looking at Marimo’s examples and docs, I’m not convinced by some of its design choices. The idea that you can run notebook cells out of order is supposed to be a strength, but I actually see it as a weakness. The order of cells is what makes a notebook readable and self-documenting. The discipline of keeping cells in order may be painful, but it’s what makes the flow of analysis understandable to others.
Also, I find the way Marimo uses decorators and functions for defining cells pretty awkward (Although it’s nicely abstracted away in the UI). It looks like normal Python, but the functions don’t behave like real functions, and decorators are a fairly advanced feature that most beginners don’t use.
For me, Quarto notebooks strike a better balance when it comes to generating sharable documents, prototypes, and reports. They’re git-friendly, use simple markup for defining cells, and still keep the clear, linear structure.
However, Marimo might be the best tool for replacing Streamlit apps and “production notebooks” (Although I’d also argue that notebooks should not be in production).
marimo has a quarto extension and a markdown fileformat [1] (marimo check works on this too!). The python fileformat was chosen such that "notebooks" are still valid python, but yes- the format itself is almost an implementation detail to most "notebook" users. Cells _are_ actually callable and importable functions though (you can give them a name), but the return signature is a bit different from what's serialized.
> The discipline of keeping cells in order may be painful, but it’s what makes the flow of analysis understandable to others.
We might have to agree to disagree here, you can still chose to have your notebook in order and something you can be disciplined about. The difference is that a marimo notebook can't become unreproducible the same way a jupyter notebook can, _because_ the order doesn't matter.
Regarding copy-paste, I’ve been thinking the LLM could control a headless Neovim instance instead. It might take some specialized reinforcement learning to get a model that actually uses Vim correctly, but then it could issue precise commands for moving, replacing, or deleting text, instead of rewriting everything.
Even something as simple as renaming a variable is often safer and easier when done through the editor’s language server integration.
I’ve tried this approach when working in chat interfaces (as opposed to IDEs), but I often find it tricky to review diffs without the full context of the codebase.
That said, your comment made me realize I could be using “git apply”more effectively to review LLM-generated changes directly in my repo. It’s actually a neat workflow!
My experience with parallel agents is that the bottleneck is not how fast we can produce code but the speed at which we can review it and context switch. Realistically, I don’t think most people have the mental capacity to supervise more than one simultaneous task of any real complexity.
I think one key reason HUDs haven’t taken off more broadly is the fundamental limitation of our current display medium - computer screens and mobile devices are terrible at providing ambient, peripheral information without being intrusive. When I launch an AI agent to fix a bug or handle a complex task, there’s this awkward wait time where it takes too long for me to sit there staring at the screen waiting for output, but it’s too short for me to disengage and do something else meaningful. A HUD approach would give me a much shorter feedback loop. I could see what the AI is doing in my peripheral vision and decide moment-to-moment whether to jump in and take over the coding myself, or let the agent continue while I work on something else. Instead of being locked into either “full attention on the agent” or “completely disengaged,” I’d have that ambient awareness that lets me dynamically choose my level of involvement. This makes me think VR/AR could be the killer application for AI HUDs. Spatial computing gives us the display paradigm where AI assistance can be truly ambient rather than demanding your full visual attention on a 2D screen. I picture that this would be especially helpful for help on more physical tasks, such as cooking, or fixing a bike.
You just described what I do with my ultrawide monitor and laptop screen.
I can be fully immersed in a game or anything and keep Claude in a corner of a tmux window next to a browser on the other monitor and jump in whenever I see it get to the next step or whatever.
It’s a similar idea, but imagine you could fire off a task, and go for a run, or do the dishes. Then be notified when it completes, and have the option to review the changes, or see a summary of tests that are failing, without having to be at your workstation.
You can do this today with OpenAI Codex, which is built into ChatGPT (and distinct from their CLI tool, also called codex). It will allow you to prompt, review, provide feedback, etc via the app. When you're ready, there is a GitHub PR button that links into a filled out pull request. It has notifications and everything.
There are a handful of products that all have a similar proposition (with better agents than OpenAI frankly), but Codex I've found is unique in being available via a consumer app.
The only real life usage of any kind of HUD I can imagine at the moment is navigation, and I have only ever used that (or other car related things) as something I selectively look at, never felt like it's something I need to have in sight at all times.
That said, the best GUI is the one you don't notice, so uh... I can't actually name anything else, it's probably deeply engrained in my computer usage.
Very interested in this. I have been contemplating building something similar, but am unaware of any existing services that do this. Haven't played with pyannote, how does it compare to whisper?
Also thought it might be useful to be able to OCR screenshots and use the text to inform the summariation and transcription especially for things like code snippets and domain-specifc terms.
I remember whisper v3 large blowing my mind: it was able to properly transcribe some two language monstrosity (przescreenować, which is a english word "to screen a candidate", but conjugated according to standard polish rules). Once I saw that I thought "it's finally time: truly good transcription has finally arrived".
So I view whisper as sota with excellent accuracy.
Now, for the type of transcription I need speaker discerning is much more valuable than accurate to the point translation: so it will be summarized anyway and that tends to gloss over some of errors anyway.
That said, pyannote has also caught me off guard: it correctly annotated lazily spoken "DP8" with non native speaker accent.
AlphaFold is a real game-changer in predicting many protein structures, but its precision in dealing with single residue mutations, particularly in non-standard proteins, isn't a sure bet.
The tool excels because it's been trained on a massive database of known protein structures. It's great at making educated guesses based on that data, but it's not as reliable when it comes to variations that don't have much historical data, like specific mutations at the residue level.
For these finer details, traditional physics-based methods, like molecular dynamics simulations, might offer more insight. They really get into the atomic-level interactions, which can be critical for understanding the subtle effects of amino acid changes.
AlphaFold is likely to identify significant structural changes, but it might not be your go-to for pinpointing smaller, more nuanced shifts.