I saw a good talk, though I don't remember the name, that went over the array-index approach. It correctly pointed out that by then, you're basically recreating your own pointers without any of the guarantees rust, or even C++ smart pointers, provide.
> It correctly pointed out that by then, you're basically recreating your own pointers without any of the guarantees rust, or even C++ smart pointers, provide.
I've gone back and forth on this, myself.
I wrote a custom b-tree implementation in rust for a project I've been working on. I use my own implementation because I need it to be an order-statistic tree, and I need internal run length encoding. The original version of my b-tree works just like how you'd implement it in C. Each internal node / leaf is a raw allocations on the heap.
Because leaves need to point back up the tree, there's unsafe everywhere, and a lot of raw pointers. I ended up with separate Cursor and CursorMut structs which held different kinds of references to the tree itself. Trying to avoid duplicating code for those two cursor types added a lot of complex types and trait magic. The implementation works, and its fast. But its horrible to work with, and it never passed MIRI's strict checks. Also, rust has really bad syntax for interacting with raw pointers.
Recently I rewrote the b-tree to simply use a vec of internal nodes, and a vec of leaves. References became array indexes (integers). The resulting code is completely safe rust. Its significantly simpler to read and work with - there's way less abstraction going on. I think its about 40% less code. Benchmarks show its about 25% faster than the raw pointer version. (I don't know why - but I suspect the reason is due to better cache locality.)
I think this is indeed peak rust.
It doesn't feel like it, but using an array-index style still preserves many of rust's memory safety guarantees because all array lookups are bounds checked. What it doesn't protect you from is use-after-free bugs.
Interestingly, I think this style would also be significantly more performant in GC languages like javascript and C#, because a single array-of-objects is much simpler for the garbage collector to keep track of than a graph of nodes & leaves which all reference one another. Food for thought!
> Recently I rewrote the b-tree to simply use a vec of internal nodes
Doesn't this also require you to correctly and efficiently implement (equivalents of C's) malloc() and free()? IIUC your requirements are more constrained, in that malloc() will only ever be called with a single block size, meaning you could just maintain a stack of free indices -- though if tree nodes are comparable in size to integers this increases memory usage by a significant fraction.
(I just checked and Rust has unions, but they require unsafe. So, on pain of unsafe, you could implement a "traditional" freelist-based allocator that stores the index of the next free block in-place inside the node.)
Depends on if you need to allocate/deallocate nodes. If you construct the tree once and don’t modify it thereafter you don’t need to. If you do need to modify and alloc/dealloc nodes you can use a bitmap to track free/occupied slots which is very fast (find first set + bitmanip) and has minuscule overhead even for integer sized elements.
Yeah, or just store all freed nodes in a linked list. Eg, have a pointer / index from the root to the first unused (free) node, and in that node store a pointer to the next one and so on. This is pretty trivial to implement.
In my case, inserts and read operations vastly outnumber deletes. So much so that in all of my testing, I never saw a leaf node which could be freed anyway. (Leaves store ~32 values, and there were no cases where all of a leaf's values actually get deleted). I decided to just leak nodes if it ever happens in real life.
The algorithm processes data in batches then frees everything. So worst case, it just has slightly higher peak memory usage while processing. A fine trade in this case given it let me remove ~200 lines of code - and any bugs that might have been lurking in them.
Having gone full-in on this approach before, with some good success, it still feels wrong to me today. Contiguous storage may work for reasonable numbers of elements, but it's potentially blocking a huge contiguous chunk of address space especially for large numbers of elements.
I probably say this because I still have to main 32-bit binaries (only 2G of address space), but it can potentially be problematic even on 64-bit machines (typically 256 TB of address space), especially if the data structure should be a reusable container with unknown number of instances. If you don't know a reasonable upper bound of elements beforehand, you have to reallocate later, or drastically over-reserve from the start. The former removes a pointer stability guarantee, the later is uneconomical, it may even be uneconomical on 64-bit depending on how many instances of the data structures you plan to have. And having to reallocate when overflowing the preallocated space makes operations less deterministic with regards to execution time.
> Having gone full-in on this approach before, with some good success, it still feels wrong to me today. Contiguous storage may work for reasonable numbers of elements, but it's potentially blocking a huge contiguous chunk of address space especially for large numbers of elements.
That makes sense. If my btree was gigabytes in size, I might rethink the approach for a number of reasons. But in my case, even for quite large input, the data structure never gets more than a few megabytes in size. Thats small enough that resizing the vec has a negligible performance impact.
It helps that my btree stores its contents using lossless internal run-length encoding. Eg, if I have values like this:
In my use case, this compaction decreases the size of the data structure by about 20x. There's some overhead in joining and splitting values - but its easily worth it.
Weak is very helpful in preventing ownership loops which prevent deallocation.
Weak plus RefCell lets you do back pointers cleanly. You call ".borrow()" to get access to the data protected by a RefCell. The run-time borrow panics if someone else is using the data item. This prevents two mutable pointers to the same data, which Rust requires.
Static analysis could potentially check for those potential panics at compile time. If that was implemented, the run time check, and the potential for a panic, would go away. It's not hard to check, provided that all borrows have limited scope. You just have to determine, conservatively, that no two borrow scopes for the same thing overlap.
If you had that check, it would be possible to have something that behaves like RefCell, but is checked entirely at compile time. Then you know you're free of potential double-borrow panics.
I started a discussion on this on a Rust forum. A problem is that you have to perform that check after template expansion, and the Rust compiler is not set up to do global analysis after template expansion. This idea needs further development.
This check belongs to the same set of checks which prevent deadlocking a mutex against itself.
There's been some work on Rust static deadlock analysis, but it's still a research topic.
I didn't consider that. Looking at how weak references work, that might work. It would reduce the need for raw pointers and unsafe code. But in exchange, it would add 16 bytes of overhead to every node in my data structure. That's pure overhead - since the reference count of all nodes should always be exactly 1.
However, I'm not sure what the implications are around mutability. I use a Cursor struct which stores a reference to a specific leaf node in the tree. Cursors can walk forward in the tree (cursor.next_entry()). The tree can also be modified at the cursor location (cursor.insert(item)). Modifying the tree via the cursor also updates some metadata all the way up from the leaf to the root.
If the cursor stored a Rc<Leaf> or Weak<Leaf>, I couldn't mutate the leaf item because rc.get_mut() returns None if there are other strong or weak pointers pointing to the node. (And that will always be the case!). Maybe I could use a Rc<Cell<Leaf>>? But then my pointers down the tree would need the same, and pointers up would be Weak<Cell<Leaf>> I guess? I have a headache just thinking about it.
Using Rc + Weak would mean less unsafe code, worse performance and code thats even harder to read and reason about. I don't have an intuitive sense of what the performance hit would be. And it might not be possible to implement this at all, because of mutability rules.
Switching to an array improved performance, removed all unsafe code and reduced complexity across the board. Cursors got significantly simpler - because they just store an array index. (And inserting becomes cursor.insert(item, &mut tree) - which is simple and easy to reason about.)
I really think the Vec<Node> / Vec<Leaf> approach is the best choice here. If I were writing this again, this is how I'd approach it from the start.
> What it doesn't protect you from is use-after-free bugs.
How about using hash maps/hash tables/dictionaries/however it's called in Rust? You could generate unique IDs for the elements rather than using vector indices.
But Unity game objects are the same way: you allocate them when they spawn into the scene, and you deallocate them when they despawn. Accessing them after you destroyed them throws an exception. This is exactly the same as entity IDs! The GC doesn't buy you much, other than memory safety, which you can get in other ways (e.g. generational indices, like Bevy does).
But in rust you have to fight the borrow checker a lot, and sometimes concede, with complex referential stuff. I say this as someone who writes a good bit of rust and enjoys doing so.
I just don't, and even less often with game logic which tends to be rather simple in terms of the data structures needed. In my experience, the ownership and borrowing rules are in no way an impediment to game development. That doesn't invalidate your experience, of course, but it doesn't match mine.
The difference is that I'm writing a metaverse client, not a game. A metaverse client is a rare beast about halfway between an MMO client and a web browser.
It has to do most of the the graphical things a 3D MMO client does. But it gets all its assets and gameplay instructions from a server.
From a dev perspective, this means you're not making changes to gameplay by recompiling the client. You make changes to objects in the live world while you're connected to the server. So client compile times (I'm currently at about 1 minute 20 seconds for a recompile in release mode) aren't a big issue.
Most of the level and content building machinery of Bevy or Unity or Unreal Engine is thus irrelevant. The important parts needed for performance are down at the graphics level. Those all exist for Rust, but they're all at the My First Renderer level. They don't utilize the concurrency of Vulkan or multiple CPUs. When you get to a non-trivial world, you need that. Tiny Glade is nice, but it works because it's tiny.
What does matter is high performance and reliability while content is coming in at a high rate and changing. Anything can change at any time, but usually doesn't. So cache type optimizations are important, as is multithreading to handle the content flood.
Content is constantly coming in, being displayed, and then discarded as the user moves around the big world.
All that dynamism requires more complex data structures than a game that loads everything at startup.
Rust's "fearless multiprogramming" is a huge win for performance. I have about 20 threads running, and many are doing quite different things. That would be a horror to debug in C++. In Rust, it's not hard.
(There's a school of thought that says that fast, general purpose renderers are impossible. Each game should have its own renderer. Or you go all the way to a full game engine and integrate gameplay control and the scene graph with the renderer. Once the scene graph gets big enough that (lights x objects) becomes too large to do by brute force, the renderer level needs to cull based on position and size, which means at least a minimal scene graph with a spatial data structure. So now there's an abstraction layering problem - the rendering level needs to see the scene graph. No one in Rust land has solved this problem efficiently. Thus, none of the four available low-level renderers scale well.
I don't think it's impossible, just moderately difficult. I'm currently looking at how to do this efficiently, with some combination of lambdas which access the scene graph passed into the renderer, and caches. I really wish someone else had solved this generic problem, though. I'm a user of renderers, not a rendering expert.)
Meta blew $40 billion dollars on this problem and produced a dud virtual world, but some nice headsets. Improbable blew upwards of $400 million and produced a limited, expensive to run system. Metaverses are hard, but not that hard. If you blow some of the basic architectural decisions, though, you never recover.
The dependency injection framework provided by Bevy also particularly elides a lot of the problems with borrow checking that users might run into and encourages writing data oriented code that generally is favorable to borrow checking anyway.
This is a valid point. I've played a little with Bevy and liked it. I have also not written a triple-A game in Rust, with any engine, but I'm extrapolating the mess that might show up once you have to start using lots of other libraries; Bevy isn't really a batteries-included engine so this probably becomes necessary. Doubly so if e.g. you generate bindings to the C++ physics library you've already licensed and work with.
These are all solvable problems, but in reality, it's very hard to write a good business case for being the one to solve them. Most of the cost accrues to you and most of the benefit to the commons. Unless a corporate actor decides to write a major new engine in Rust or use Bevy as the base for the same, or unless a whole lot of indie devs and part-time hackers arduously work all this out, it's not worth the trouble if you're approaching it from the perspective of a studio with severe limitations on both funding and time.
Thankfully my studio has given me time to be able to submit a lot of upstream code to Bevy. I do agree that there's a bootstrapping problem here and I'm glad that I'm in a situation where I can help out. I'm not the only one; there are a handful of startups and small studios that are doing the same.
Given my experience with Bevy this doesn't happen very often, if ever.
The only challenge is not having an ecosystem with ready made everything like you do in "batteries included" frameworks.
You are basically building a game engine and a game at the same time.
We need a commercial engine in Rust or a decade of OSS work. But what features will be considered standard in Unreal Engine 2035?
Long bet: people are going to write much more code in 2035 than today. It's just going to be very different.
(For the record software development has nothing to do now with how it looked when I started in 2003, plenty of things have revolutionized the way we write code (especially Github) and made us an order of magnitude more productive at least. Yet the number of developer has skyrocketed. I don't expect this trend to stop, AI is yet another productivity boost in an industry that already faced a lot of them in recent time.
I see this and I am reminded when I had to fight the 0 indexing, when I was cutting my teeth in C, for class.
I wonder why no one complains about 0 indexing anymore. Isn't it weird how you have to go 0 to length - 1, and implement algorithm differently than in a math book?
And others like Pascal linage (Pascal, Object Pascal, Extended Pascal, Modula-2, Ada, Oberon,...), that have flexible bounds, they can be whatever numeric subranges we feel like using, or enumeration values.
Maths books aren't being weird. They are counting in a way most people learn to count. One apple, two apples, three apples. You don't start zeroth apple, one apple, two apples, then respond the set of apple contains three apples.
But computers are not actually counting array elements, it's more accurate to compare array indexing with distance measurement. The pointer (memory address) puts you at the start of the array, so the first element is right there under your feet (i.e. index 0). The other elements are found by measuring how far away from the start they are:
I find indices starting from zero much easier. Especially when index/pointer arithmetic is involved like converting between pixel or voxel indices and coordinates, or indexing in ring buffers. 1-based indexing is one of the reasons I eventuallz abandoned Mathematica, because it got way too cumbersome.
So the reason why you don't see many people fighting 0-indexing is because they actually prefer it.
I started out with BASIC and Fortran, which use 1 based indices. Going to C was a small bump in the road getting used to that, and then it's Fortran which is the oddball.
Most oldschool BASIC dialects (including the original Dartmouth IIRC) use 0-based indices, though. It's the worst of both worlds, where something like:
DIM a(10)
actually declares an array of 11 elements, with indices from 0 to 10 inclusive.
I believe it was QBASIC that first borrowed the ability to define ranges explicitly from Pascal, so that we could do:
I don't think so. One based numbering is barring few particular (spoken) languages the default. You have to had to change your counting strategies when going from regular world to 0 based indices.
Maybe you had the luck of learning 0 based language first. Then most of them were a smooth ride.
My point is you forgot how hard it is because it's now muscle memory (if you need a recap of the difficulty learn a program with arbitrary array indexing and set you first array index to something exciting like 5 or -6). It also means if you are "fighting the borrow checker" you are still at pre-"muscle memory" stage of learning Rust.
> Maybe you had the luck of learning 0 based language first. Then most of them were a smooth ride.
Given most languages since at least C have 0-based indexing... I would think most engineers picked it up early? I recall reading The C Programming Language 20 years ago, reading the reason and just following what it says. I don't think it's as complex as the descriptions people put forward of "fighting the borrow checker." One is "mentally add/subtract 1" and another is "gain a deep understanding of how memory management works in Rust." I know which one I'm going to find more challenging when I get round to trying to learn Rust...
> Given most languages since at least C have 0-based indexing.
As I mentioned I started Basic on C64, and schools curriculum was in Pascal. I didn't learn about C until I got to college.
> One is "mentally add/subtract 1" and another is "gain a deep understanding of how memory management works in Rust."
In practice they are, you start writing code. At first you trip on your feet, read stuff carefully, then try again until you succeed.
Then one day, you wake up and realize I know 0 indices and/or borrow checker. You don't know how you know, you just know you don't make those mistakes anymore.
I was Basic (on C64) -> assembly -> Pascal -> C, more or less. 0-based indexing wasn't too bad for me, except when it came to for loops.
for (i=0; i<length; i++)
I eventually just memorized that pattern, but stumbled every time any part of it changed. I had to rethink the whole logic every time to figure out < vs <= and length vs length-1, and usually ended up getting an answer that was both confident and wrong.
The borrow checks feels similar but different. It feels like it has more "levels" to it. Initially, it came naturally to me and I wondered what all the fuss was about. I was fighting iterators and into, not the borrow checker. I just had to mentally keep track of what owned my data and it all felt pretty obvious.
Then I started working on things that didn't fit into my naive mental model, and it became a constant fight.
So overall, a similar experience to 0-based indexing, yes. (Except I still don't "just know" the trickier bits of the borrow checker yet, so I don't know what comes next.)
I literally just described my process, so I don’t get how you got to “you don’t know how you know” because… well… I just told you.
Also, there’s a huge difference between beginners not understanding 0-based indexing and experienced C++ engineers describing the challenges understanding Rust’s unique features. I mean, Jesus Christ, we’re commenting on a thread here of experienced engineers commenting on how challenging it can be! I really don’t know what else to say.
> Also, there’s a huge difference between beginners not understanding 0-based indexing and experienced C++ engineers describing the challenges understanding Rust’s unique features
Keep in mind I wasn't new to programming at that point. I was programming in C64 basic for 3 years and Pascal for 3 as well. For hobby of course, and not fully.
Zero indexes aren't simple, they are intertwined everywhere but they are easy - as in familiar.
What experienced C++ devs in Rust are not much different than experienced Pascal devs in C. Lost. And having to rethread semi-familiar grounds.
---
And I described you my own. I don't fight the borrow checker. I just intuitively know how it works and what to avoid.
If you think just subtracting one or adding one is enough, there should be an easy enough way to test if it is. In Veritasium video they mention that having glasses that turn your vision upside down will cause confusion at first, but you will get used to them quickly.
What you could do is take a language that has arbitrary starting index value and set it to something weird. Like 42 or -5. Then rewrite your programs. See how many off by 41 errors you make. Then once you no longer make mistakes with it. Go back to 0 indexes.
> What you could do is take a language that has arbitrary starting index value and set it to something weird. Like 42 or -5. Then rewrite your programs. See how many off by 41 errors you make. Then once you no longer make mistakes with it. Go back to 0 indexes.
Do you need me to comment on the difference between 0 and 42?
But you need to see with eyes of a newbie. I mean what is the problem here, you did say it's just adding or subtracting a number whether it's 0, 1, -1 or 42? Should be trivial, right?
My guess while the change of indices is simple (altering ranges by a constant), it's going to be hard (requiring constant mental effort until it's internalized).
> I mean what is the problem here, you did say it's just adding or subtracting a number whether it's 0, 1, -1 or 42? Should be trivial, right?
Apparently I do need to comment but I don't think any human on earth has the words to convince you that, as a human, adding/subtracting by 1 is a billion times easier to understand than adding/subtracting 42.
For languages with 0-based array element numbering, say what the numbers are: they're offsets. 0-based arrays have offsets, 1-based arrays have indices.
I sometimes work on creating my own programming language (because there aren't enough of those already) and one of the things I want to do in it is 1-based indexing. Just so I can do:
You can't do possibly-erroneous pointer math on a C# object reference. You don't need to deal with the game life cycle AND the memory life cycle with a GC. In Unity they free the native memory when a game object calls Destroy() but the C# data is handled by the GC. Same with any plain C# objects.
To say it's the same as using array indices is just not true.
> You can't do possibly-erroneous pointer math on a C# object reference.
Bevy entity IDs are opaque and you have to try really hard to do arithmetic on them. You can technically do math on instance IDs in Unity too; you might say "well, nobody does that", which is my point exactly.
> You don't need to deal with the game life cycle AND the memory life cycle with a GC.
I don't know what this means. The memory for a `GameObject` is freed once you call `Destroy`, which is also how you despawn an object. That's managing the memory lifecycle.
> In Unity they free the native memory when a game object calls Destroy() but the C# data is handled by the GC. Same with any plain C# objects.
Is there a use for storing data on a dead `GameObject`? I've never had any reason to do so. In any case, if you really wanted to do that in Bevy you could always use an `EntityHashMap`.
More than the trying to find another object kind of math, I was mostly thinking about address aliasing ie cleared handles pointing to re-used space and now live but different objects. You could just say "don't screw up your handle/alloc code" but it's just something you don't have to worry about when you don't roll your own.
The live C# but dead Unity object trick is mostly only useful for dangling handles and IDs and such. It's more that memory won't be ripped out from under you for none Unity data and the usual GC rules apply.
And again the difference between using the GC and rolling your own implementation is pretty big. In your hash map example you still have to solve the issue of how long you keep entries in that map. The GC answers that question.
While we don't need, we can, that is the beauty of languages like C#, that offer the productivity of automatic memory management, and the tools to go low level if desired/needed.
At least in terms of doing math on indices, I have to imagine you could just wrap the type to make indices opaque. The other concerns seem valid though.
Yes but regarding use of uninitialized/freed memory, neither GC nor memory safety really help. Both "only" help with totally incidental and unintentional and small scale violations.