Author here. WebAssembly has played an important role protecting CI on the Internet and it's cool that it enables us to run LLVM in the browser. I was very surprised when I learned that some people were using it offline outside of browsers, since doing so requires a JVM-like runtime such as wasmtime. Cosmopolitan proves that it's possible to just fix C instead, which might fix all the stuff that's built on top of C too. We can in fact have first-class portable native binaries, which will only require JIT compilation if they're launched on a non-x86 machine. This is probably how Java would have been designed, if Intel's instruction set hadn't been encumbered by patents at the time. That's changed. The x86-64 patents just expired this year. So I say why not use it as a canonical software encoding.
If Sun would have picked a ‘real’ CPU, I doubt it would have been x86 (did they even sell any of them at the time?)
Also, picking a real CPU is only worth it for running on that CPU, and that particular one. If they had picked x86 in 1996, they wouldn’t even have supported MMX.
In a platform-independent design, using n registers in the virtual CPU is only a good idea for n = 0 (i.e. a stack machine, as in the JVM, .NET CLR and Web Assembly) and n = ∞ (as in LLVM).
If you pick any other number your platform-independent code needs to handle both the case where the number of op real registers is smaller (so you need to have a register allocator) and the case where it is larger (there, you could just ignore the additional registers, but that means giving up a lot of performance, so you have to somehow figure out which data would have been in registers if the CPU had more of them).
Why write and optimize both of these complex problems, if you can pick a number of registers at either end of the scale, and spend all your resources on one of them?
And that’s even more true for x86-64, which doesn’t have a fully orthogonal instruction set, has I don’t know how many ways to do vector instructions, none of which allow you to clearly express how long the vectors you’re iterating over are (making it hard to optimally map them to other CPUs or newer vector instructions), has 80-bit floats, etc.
Also, the high-level nature of Java bytecode enables/simplifies many optimisations in the JIT.
For example, dynamic dispatch is handled by the JIT, it's not encoded into low-level Java bytecode instructions, so if the JIT can see there's only one class loaded that implements a certain interface, it can generate code that directly invokes methods of that implementing class, without going through a vtable. It can do this even across library boundaries. That wouldn't be possible (or at least, would be greatly complicated) if Java bytecode provided pre-baked machine code.
Modern JVMs also have pretty deep integration of the GC and the JIT, if I understand correctly. The Java bytecode format is high level so the JIT is quite free to implement memory-management however it likes. If the JVM took a truly low-level approach to its IR, we'd presumably be stuck with 90's GC technology.
I imagine it would also have implications for the way the JVM handles concurrency. It seems right that it defines its own memory model with its own safety guarantees, rather than defining its model as whatever x86 does.
It's telling that .Net took the same high-level approach that Java bytecode did.
"Only require JIT compilation if they're launched on a non-x86 machine" is a heck of a caveat.
The x86-64 instruction set is not a good portable IR. Some well-known problems:
* It's much more irregular to decode than other instruction sets/IRs
* There is a truly vast set of instructions, very many of which are almost never used. So in practice you need to define the subset of x86-64 you're using. E.g. does your subset contain AVX-512? AVX2? AVX? BMI? X87? MMX? XGETBV? etc etc etc. These decisions will impact the performance of your code on actual x86 CPUs as well as non-x86 CPUs.
* x86-64 assumes the TSO memory model which is horribly expensive to support on CPUs whose native model is weaker (e.g. ARM) (which is why Apple added a TSO mode for M1; no other ARM chips have this)
Honestly, declaring x86-64 your portable IR and then claiming that as a technological breakthrough sounds like a trick to me. I'd agree it's a breakthrough if you define your x86-64 subset, show your x86-to-ARM compiler, and show that it produces code competitive with your competitors (e.g. WASM compilers).
> * It's much more irregular to decode than other instruction sets/IRs
According to this blog post, it's not just "much more irregular to decode", instruction deciding is the Achilles Heel of x86 which allows M1 to be so fast by comparison.
> doing so requires a JVM-like runtime such as wasmtime. Cosmopolitan proves that it's possible to just fix C instead
This doesn't sound like a fair comparison. WASM isn't just providing portability, it's also providing security and runtime safety. Cosmopolitan doesn't, it presumably requires you to trust the codebase, and doesn't protect you from C's undefined behaviour. Of course, WASM also imposes a considerable performance penalty, I presume the Cosmopolitan approach easily outperforms WASM.
> This is probably how Java would have been designed, if Intel's instruction set hadn't been encumbered by patents at the time.
To mirror the comment by Someone, I sincerely doubt this. Java bytecode is a very high level stack-based IR, nothing like a CPU ISA. They could easily have made it resemble a CPU ISA if they'd wanted to. The lesser-known GNU Lightning JIT engine takes this approach, for instance.
> WASM only provides sandboxing. That is not the same as security
The relevant Wikipedia article is named Sandbox (computer security).
> nor it means runtime safety nor protection from undefined behavior
It puts stronger constraints on what mischief undefined behaviour can lead to, and guarantees that various runtime errors are handled with traps. [0] This isn't the same as a hard guarantee that execution will terminate whenever undefined behaviour is invoked, but it's still a step up.
Speaking as someone who would love to see WebAssembly succeed as a cross-platform, cross-language way for me to write sandboxed plugins and CLI utilities, I do have to point out that, for out-of-browser use, WASM's MVP does regress various things:
> Under WebAssembly's memory model, dereferencing NULL=0 won't lead to a segfault.
Thanks that's a curious one. It won't always lead to a segfault with a conventional compiler either, undefined behaviour being what it is. [0][1] Fortunately GCC can be asked to add such checks at runtime, [2] this approach could also be taken with WebAssembly.
I was more referring to how it's standard practice on modern OSes to leave the zero page unmapped in a process's address space so that, if the compiler converts the NULL dereference into a dereference of address 0x0, it'll trigger a segfault.
Are you familiar with IBM i (née AS/400)? There's a similar idea there, where programs are compiled to a virtual instruction set called TIMI then statically recompiled to whatever the machine's native instruction set is at the first load on a given machine. As a result, you can just drop a System/36 binary on a modern PowerPC system and not even notice that you've just crossed two changes of instruction set.
WORA's been promised for decades. Even if it's true no one will believe it (including me). And portability is not the only reason people choose Rust or Go over C.
That's before you get into whether you can aesthetically convince everyone that x86-64 is good, actually.
Still very interesting though, especially compared to the usual Show HN crap.
Portability was a long desired attribute of computer software up until it turns out that it doesn't matter at all.
When software was deployed on floppy disks and there were 10k architectures going around, portability was critical to expand the market reach of any distributed software.
But in the modern age when the software runs "in the cloud" and there are dominant platforms out there, turns out that portability doesn't matter - nobody is changing platforms when they don't need to.
So this makes this achievement a pretty cool demo without any industry impact, since people don't distribute C binaries anymore.
The idea behind it and the technology developed is pretty cool :)