It sounds like you’re disagreeing yet no case is made that throughout and latency isn’t worse.
For example, the best frameworks on TechEmpower are all Rust, C and C++ with the best Java coming in at 25% slower on that microbenchmark. My point stands - it is generally true that well written rust/c/c++ outperforms well written Java and .Net and not just with lower memory usage. The “engineering effort per performance” maybe skews to Java but that’s different than absolute performance. With rust to me it’s also less clear if that is actually even true.
Honestly, if these languages are only winning by 25% in microbenchmarks, where I’d expect the difference to be biggest, that’s a strong boost for Java for me. I didn’t realise it was so close, and I hate async programming so I’m definitely not doing it for an, at most, 25% boost.
It’s not about the languages only, but also about runtimes and libraries. The vert.x vertices are reactive. Java devrel folks push everyone from reactive to virtual threads now. You won’t see it perform in that ballpark. If you look at the bottom of the benchmark results table, you’ll find Spring Boot (servlets and a bit higher Reactor), together with Django (Python). So “Java” in practice is different from niche Java. And if you look inside at the codebase, you’ll see the JVM options. In addition, they don’t directly publish CPU and memory utilization. You can extract it from the raw results, but it’s inconclusive.
This stops short of actually validating the benchmark payloads and hardware against your specific scenario.
> So “Java” in practice is different from niche Java.
This is an odd take, especially when in the discussion of Rust. In practice when talking about projects using Rust as an http server backend is non-existent in comparison. Does that mean we just get to write off the Rust benchmarks?
I don’t understand what you’re saying. Typical Java is Spring Boot. Typical Rust is Axum and Actix. I don’t see why it would make sense to push the argument ad absurdum. Vert.x is not typical Java, its not easy to get it right. But Java the ecosystem profits from Netty in terms of performance, which does the best it can to avoid the JVM, the runtime system. And it’s not always about “HTTP servers” though that’s what that TechEmpower benchmark subject matter is about - frameworks, not just languages.
Your last sentence reads like an expression of faith. I’ll only remark that performance is relative to one’s project specs.
In some of those benchmarks, Quarkus (which is very much "typical Java") beats Axum, and there's far more software being written in "niche Java" than in "typical Rust". As for Netty, it's "avoiding the JVM" (standard library, really) less now, and to the extent that it still does, it might not be working in its favour. E.g. we've been able to get better results with plain blocking code and virtual threads than with Netty, except in situations where Netty's codecs have optimisations done over many years, and could have been equally applied to ordinary Java blocking code (as I'm sure they will be in due time).
Hey Ron, I’ve got deep respect for what you do and appreciate what you’re sharing, that’s definitely good to know. And I understand that many people take any benchmark as a validation for their beliefs. There are so many parameters that are glossed over at best. More interesting to me is the total cost of bringing that performance to production. If it’s some gibberish that takes a team of five a month to formulate and then costs extra CPU and RAM to execute, and then becomes another Perlesque incantation that no one can maintain, it’s not really a “typical” thing worth consideration, except where it’s necessary, scoped to a dedicated library, and the budget permits.
I don’t touch Quarkus anymore for a variety of issues. Yes, sometimes it’s Quarkus ahead, sometimes it’s Vert.x, from what I remember it’s usually bare Vert.x. It boils down to the benchmark iteration and runtime environment. In a gRPC benchmark, Akka took the crown in a multicore scenario - at a cost of two orders of magnitude more RAM and more CPU. Those are plausible baselines for a trivial payload.
By Netty avoiding the JVM I referred mostly to its off-heap memory management, not only the JDK APIs you guys deprecated.
I’m deeply ingrained in the Java world, but your internal benchmarks rarely translate well to my day-to-day observations. So I’m quite often a bit perplexed when I read your comments here and elsewhere or watch your talks. Without pretending I comprehended the JVM on a level comparable to yours, in my typical scenarios, I do quite often manage to get close to the throughput of my Rust and C++ implementations, albeit at a much higher CPU and memory cost. Latency & throughput at once is a different story though. I genuinely hope that one day Java will become a platform for more performance-oriented workloads, with less nondeterminism. I really appreciate your efforts toward introducing more consistency into JDK.
I didn’t make the claim that it’s worth it. But when it is absolutely needed Java has no solution.
And remember, we’re talking about a very niche and specific I/O microbenchmark. Start looking at things like SIMD (currently - I know Java is working on it) or in general more compute bound and the gap will widen. Java still doesn’t yet have the tools to write really high performance code.
But it does. Java already gives you direct access sto SIMD, and the last major hurdle to 100% of hardware performance with idiomatic code, flattened structs, will be closed very soon. The gap has been closing steadily, and there's no sign of change in the trend. Actually, it's getting harder and harder to find cases where a gap exists at all.
First, in all benchmarks but two, Java performs just as well as C/C++/Rust, and in one of those two, Go performs as well as the low-level languages. Second, I don't know the details of that one benchmark where the low-level languages indeed perform better than high-level ones, but I don't see any reason to believe it has anything to do with virtual threads.
Modern Java GCs typically offer a boost over more manual memory management. And on latency, even if virtual were very inefficient and you'd add a GC pause with Java's new GCs, you'd still be well below 1ms, i.e. not a dominant factor in a networked program.
(Yes, there's still one cause for potential lower throughput in Java, which is the lack of inlined objects in arrays, but that will be addressed soon, and isn't a big factor in most server applications anyway or related to IO)
BTW, writing a program in C++ has always been more or less as easy as writing it in Java/C# etc.; the big cost of C++ is in evolution and refactoring over many years, because in low-level languages local changes to code have a much more global impact, and that has nothing to do with the design of the language but is an essential property of tracking memory management at the code level (unless you use smart pointers, i.e. a refcounting GC for everything, but then things will be really slow, as refcounting does sacrifice performance in its goal of minimising footprint).
A 1-millisecond pause is an eternity. That’s disk access latencies. Unless your computation is completely and unavoidably dominated by slow network, that latency will have a large impact on performance.
Ironically, Java has okay performance for pure computation. Where it shows poorly is I/O intensive applications. Schedule quality, which a GC actively interferes with, has a much bigger impact on performance for I/O intensive applications than operation latency (which can be cheaply hidden).
Who said anything about a 1ms pause? I said that even if virtual thread schedulers had terrible latencies (which they don't) and you added GC pauses, you'd still be well below 1ms, which is not an eternity in the context of network IO, which is what we're talking about here.
It is not "an eternity". A roundtrip of 100-200us - which is closer to the actual GC pause time these days (remember, I said well below 1ms) - is considered quite good and is within the 1ms order of magnitude. Getting a <<1ms pause once every several seconds is not a significant impact to all but a few niche programs, and you may even get better throughput. OS-imposed hiccups (such as page faults or scheduler decisions) are about the same as those caused by today's Java GCs. Programs for which these are "an eternity" don't use regular kernels.
Performance without a goal is wasted effort, sometimes that 1-millisecond matters, most of the time it doesn't, hence why everyone is using Web browsers with applications written in a dynamic language even worse GC pauses.
Any gc pause is unacceptable if your goal is predictable throughput and latency
Modern gcs can be pauseless, but either way you’re spending CPU on gc and not servicing requests/customers.
As for c++,
std::unique_ptr has no ref counting at all.
shared_ptr does, but that’s why you avoid it at all costs if you need to move things around. you only pay the cost when copying the shared_ptr itself, but you almost never need a shared_ptr and even when you need it, you can always avoid copying in the hot path
> Modern gcs can be pauseless, but either way you’re spending CPU on gc and not servicing requests/customers.
Since memory is finite and all computation uses some, every program spends CPU on memory management regardless of technique. Tracing GCs often spend less CPU on memory management than low-level languages.
> std::unique_ptr has no ref counting at all.
It still needs to do work to free the memory. Tracing GCs don't. The whole point of tracing GCs is that they spend work on keeping objects alive, not on freeing memory. As the size of the working set is pretty much constant for a given program and the frequency of GC is the ratio of allocation rate (also constant) to heap size, you can arbitrarily reduce the amount of CPU spent on memory management by increasing the heap.
I honestly doubt any of the frameworks in that benchmark are using virtual threads yet. The top one is still using vert.x which is an event loop on native platform threads.
For example, the best frameworks on TechEmpower are all Rust, C and C++ with the best Java coming in at 25% slower on that microbenchmark. My point stands - it is generally true that well written rust/c/c++ outperforms well written Java and .Net and not just with lower memory usage. The “engineering effort per performance” maybe skews to Java but that’s different than absolute performance. With rust to me it’s also less clear if that is actually even true.
[1] https://www.techempower.com/benchmarks/#section=data-r23