Hacker Newsnew | past | comments | ask | show | jobs | submit | matklad's commentslogin

It is the other way around --- it is _relatively_ easy to re-use the storage engine, but plug your custom state machine (implemented in Zig). We have two state machines, an accounting one, and a simple echo one here: https://github.com/tigerbeetle/tigerbeetle/blob/main/src/tes....

I am not aware of any "serious" state machine other than accounting one though.


Oh cool, thanks for providing the correct way around :)

Do you know if there is some type of guide around the state machines?


It's not that ironic though --- the number of bugs that were squashed fuzzers&asserts but would have dodged the borrow checker is much, much larger.

This is what makes TigerBeetle context somewhat special --- in many scenarios, security provided by memory safety is good enough, and any residual correctness bugs/panics are not a big deal. For us, we need to go extra N miles to catch the rest of the bugs as well, and DST is a much finer net for those fishes (given static allocation & single threaded design).


I don't think needing to go "the extra N miles" is that special. Even if security is the only correctness concern - and in lots of cases it isn't, and (some) bugs are a very big deal - memory safety covers only a small portion of the top weaknesses [1].

Mathematically speaking, any simple (i.e. non-dependent) type system catches 0% of possible bugs :) That's not to say it can't be very useful, but it doesn't save a lot of testing/other assurance methods.

[1]: https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html Also, spatial safety is more important for security than temporal safety. As far as language guarantees go, Zig and Rust only differ on #8 on the list.


You need to learn both. Both borrow checker (Rust) and comptime (Zig) are huge ideas worth getting your hands dirty with.

If you don't have time to learn both, then learn Zig, as it requires much smaller time investment (though do try to find time for both).


For what it’s worth there’s a comptime crate for Rust. Not as elegant as a language level feature but probably accomplishes very much similar things.

https://docs.rs/comptime/latest/comptime/


There are plans to experiment with language level comptime support too: https://github.com/rust-lang/rust/pull/148820


This is a very interesting feature, especially if comptime can become a true language superset with features such as the ability for seamless type-level programming, leveraging `const fn` code throughout. This would unlock the potential for having true dependent types in (the comptime portion of) Rust programs, and write arbitrary end-to-end compiler-verified proofs about code that will ultimately run in the final program.


Oh my gosh, that would be incredible! In one of my rust projects, I used enum dispatch so simple functions could be inlined, but it used a slightly hacky macro that couldn't do static dispatch. One of those things that comptime matches very well.


Love this. Borrow all the good ideas! Would also enjoy compile-time trig functions, exponents etc so we are not invoking lazy/static/once-cell/lazy-cell etc.


It is much better than this. You can _directly_ enumerate all the objects, without any probabilities involved. There's nothing about probabilities in the interface of a PRNG, it's just non-determinism!

You could _implement_ non-determinism via probabilistic sampling, but you could also implement the same interface as exhaustive search.


Well, yes. But the point is that random sampling lets you do it without thinking. Even better, it can sample over multiple spaces at the same time, and over spaces we haven't even yet formalized. "Civilization advances by extending the number of important operations which we can perform without thinking of them." (Whitehead)

An example is something like "pairwise testing" of arguments to a function. Just randomly generating values will hit all possible pairs of values to arguments, again with a logarithmic penalty.


The point is that you can exhaustively explore the space without logarithmic overhead. There's no benefits to doing it with random sampling and it doesn't even save thought.


I already explained what the benefit is. What is it with this focus on offloading work from computers to people? Let people do things more easily without thinking, even if it burns more increasingly cheap cycles.


You haven't explained what the benefit is. There aren't "spaces we haven't formalized" because of the pigeonhole principle. There are M bits. You can generate every one of those 2^M values with any max cycle permutation.

What work is being offloaded from computers to people? It's exactly the same thing with more determinism and no logarithmic overhead.


> There aren't any "spaces we haven't formalized"

Suppose that space of N points is partitioned into M relevant subsets, for now we assume of the same size. Then random sampling hits each of those subsets in O(M log M) time, even if we don't know what they are.

This sort of partitioning is long talked about in the testing literature, with the idea you should do it manually.

> what work is being offloaded

The need to write that program for explicitly enumerating the space.


Just to avoid potential confusion, the claim is that this is a function that generates a random permutation:

    pub fn shuffle(g: *Gen, T: type, slice: []T) void {
        if (slice.len <= 1) return;

        for (0..slice.len - 1) |i| {
            const j = g.range_inclusive(u64, i, slice.len - 1);
            std.mem.swap(T, &slice[i], &slice[j]);
        }
    }
And this is a function that enumerates all permutations, in order, exactly once:

    pub fn shuffle(g: *Gen, T: type, slice: []T) void {
        if (slice.len <= 1) return;

        for (0..slice.len - 1) |i| {
            const j = g.range_inclusive(u64, i, slice.len - 1);
            std.mem.swap(T, &slice[i], &slice[j]);
        }
    }
Yes, they are exactly the same function. What matters is Gen. If it looks like this

https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...

then you get a random permutation. If it rather looks like this

https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...

you enumerate all permutations.


What's being suggested also has the m log m partition behavior in the limit where N >> M. It might be easier to see why these are actually the same things with slightly different limits, imagine a huge N enumerated by an LFSR. We'll call our enumeration function rand() for tradition's sake. Now we're back to sampling.


I personally learned Zig by reading https://ziglang.org/documentation/master/ and stdlib source code once I joined TigerBeetle. enums.zig and meta.zig are good places to learn, in addition to usual suspects like array_list.zig.

(though, to be fair, my Rust experience was a great help for learning Zig, just as my C++ knowledge was instrumental in grokking Rust)


Well said! Having a mental borrow checker running in background certainly helps a lot when coding in Zig. What also helps is that Zig safety checks generally catch lifetime bugs at runtime nicely. E.g., undefined memory is set to `0xAAAA`, which, if interpreted as a pointer, is guaranteed to be invalid, and fail loudly on dereference.


I strongly agree with your statement overall, but not in details.

Regarding string manipulation, Zig has slices and comptime-checked `printf`, so that makes string handling _massively_ more ergonomic than in C. But, yeah, managing the lifetime of the printf-ed is a pain in the back, and Rust is _massively_ more ergonomic there, if managing individual string allocations is what you want to do. (though one cool thing that Zig makes easy is comptime asserting that stack-allocated buffer has the right size for the given format string).

But, I do find Zig surprisingly ergonomic overall! For example, I have two Rust crate for writing glue code utilities, xflags (argument parsing) and xshell (shelling out). For TigerBeetle, I wrote equivalents:

* https://github.com/tigerbeetle/tigerbeetle/blob/main/src/std...

* https://github.com/tigerbeetle/tigerbeetle/blob/main/src/she...

What I found is that the Zig version is significantly easier to implement and to use, for me. In Rust, those domains require macros, but in Zig it's all comptime. And memory management is relatively easy --- flags live until the end of the program, shell is just using shell-scoped arena. I am contemplating rewriting my personal scripts at https://github.com/matklad/config/tree/master/tools/gg to Zig for that reason. Though, Zig not being 1.0 _is_ a problem in that context --- I touch my scripts only rarely, and I don't want to be on the hook for upgrades.


There's a change in the tradeoffs in the above scenario:

- you still get extra benefit from Rust, but the magnitude of the benefit is reduced (e.g., no UAF without F).

- you still get extra drawbacks from Rust, but the magnitude of drawbacks is increased, as Rust generally punishes you for not allocating (boxing is a common escape hatch to avoid complex lifetimes).

Just how much tradeoff is shifted is hard to qualify unambiguously, but, from my PoV (Rust since 2015, TB/Zig since late 2022), Zig was and is the right choice in this context.


I mainly use Rust in embedded now. I don’t always rely on encoding all of the correctness in the Rust type system. To a degree all the old ways of enforcing correctness are still in play, I am just choosing when to take use idiomatic Rust or escape hatch out via shim to C-style Rust. It reminds me quite a bit of how C and C++ shops require another layer of macros or templates be used for containers, resources, etc.

The build time of Zig seems like the most desirable piece worth deciding over. Developer time is money, but it isn’t weird to have multi-hour build times in a mature project either C, C++, or Rust. The correctness suite is a bigger time sink than the build though. When building a database, you could drive the build time to 0 and still have hours in CI.


Yeah, there's actually trickiness here. Another curveball is that, with simulation testing, you generally want more assertions to catch more bugs, but slow assertions can _reduce_ testing efficiency by requiring more CPU time per iteration! But then, if you know, at comptime, that the list is going to be short, you might as well do O(N^2) verification!

We captured these consideration in the internal docs here: https://github.com/tigerbeetle/tigerbeetle/blob/0.16.60/src/...


I used to think that way, but I've learned that there's one good reason for why the API is designed that way: priority inheritance. Priorities are bound to threads, and, when a high priority threads wants to lock a currently occupied mutex, we want to bump the priority of the current holder. And POSIX requirement makes that easy --- the thread who locked the mutex must be the one holding it.

Related: https://man7.org/linux/man-pages/man2/futex.2.html#:~:text=%...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: