Author here. WebAssembly has played an important role protecting CI on the Inter...

Someone · on Dec 28, 2020

If Sun would have picked a ‘real’ CPU, I doubt it would have been x86 (did they even sell any of them at the time?)

Also, picking a real CPU is only worth it for running on that CPU, and that particular one. If they had picked x86 in 1996, they wouldn’t even have supported MMX.

In a platform-independent design, using n registers in the virtual CPU is only a good idea for n = 0 (i.e. a stack machine, as in the JVM, .NET CLR and Web Assembly) and n = ∞ (as in LLVM).

If you pick any other number your platform-independent code needs to handle both the case where the number of op real registers is smaller (so you need to have a register allocator) and the case where it is larger (there, you could just ignore the additional registers, but that means giving up a lot of performance, so you have to somehow figure out which data would have been in registers if the CPU had more of them).

Why write and optimize both of these complex problems, if you can pick a number of registers at either end of the scale, and spend all your resources on one of them?

And that’s even more true for x86-64, which doesn’t have a fully orthogonal instruction set, has I don’t know how many ways to do vector instructions, none of which allow you to clearly express how long the vectors you’re iterating over are (making it hard to optimally map them to other CPUs or newer vector instructions), has 80-bit floats, etc.

There’s a reason Google stopped work on https://en.wikipedia.org/wiki/Google_Native_Client in favor of Web Assembly.

MaxBarraclough · on Dec 28, 2020

Also, the high-level nature of Java bytecode enables/simplifies many optimisations in the JIT.

For example, dynamic dispatch is handled by the JIT, it's not encoded into low-level Java bytecode instructions, so if the JIT can see there's only one class loaded that implements a certain interface, it can generate code that directly invokes methods of that implementing class, without going through a vtable. It can do this even across library boundaries. That wouldn't be possible (or at least, would be greatly complicated) if Java bytecode provided pre-baked machine code.

Modern JVMs also have pretty deep integration of the GC and the JIT, if I understand correctly. The Java bytecode format is high level so the JIT is quite free to implement memory-management however it likes. If the JVM took a truly low-level approach to its IR, we'd presumably be stuck with 90's GC technology.

I imagine it would also have implications for the way the JVM handles concurrency. It seems right that it defines its own memory model with its own safety guarantees, rather than defining its model as whatever x86 does.

It's telling that .Net took the same high-level approach that Java bytecode did.

roca · on Dec 28, 2020

"Only require JIT compilation if they're launched on a non-x86 machine" is a heck of a caveat.

The x86-64 instruction set is not a good portable IR. Some well-known problems:

* It's much more irregular to decode than other instruction sets/IRs

* There is a truly vast set of instructions, very many of which are almost never used. So in practice you need to define the subset of x86-64 you're using. E.g. does your subset contain AVX-512? AVX2? AVX? BMI? X87? MMX? XGETBV? etc etc etc. These decisions will impact the performance of your code on actual x86 CPUs as well as non-x86 CPUs.

* x86-64 assumes the TSO memory model which is horribly expensive to support on CPUs whose native model is weaker (e.g. ARM) (which is why Apple added a TSO mode for M1; no other ARM chips have this)

Honestly, declaring x86-64 your portable IR and then claiming that as a technological breakthrough sounds like a trick to me. I'd agree it's a breakthrough if you define your x86-64 subset, show your x86-to-ARM compiler, and show that it produces code competitive with your competitors (e.g. WASM compilers).

ssokolow · on Dec 28, 2020

> * It's much more irregular to decode than other instruction sets/IRs

According to this blog post, it's not just "much more irregular to decode", instruction deciding is the Achilles Heel of x86 which allows M1 to be so fast by comparison.

https://debugger.medium.com/why-is-apples-m1-chip-so-fast-32...

moonchild · on Dec 28, 2020

Because x86 instruction encoding is a mess. Take a look at this[1] flowchart from the amd manual (vol. 3).

That doesn't even tell the whole story; there's a ton of path dependence, and the operand format is quite complicated and has tons of special cases.

1. https://0x0.st/-rce.png

jart · on Dec 28, 2020

I thought x86 was pretty straightforward once I understood the octal encoding. Cosmopolitan provides a 3kb x86 instruction length decoder that supports all past, present, and future ISAs: https://github.com/jart/cosmopolitan/blob/master/third_party... The Cosmopolitan codebase also provides a debugger / emulator for i8086 + x86-64 that's similar to GDB TUI and Bochs. See https://justine.lol/blinkenlights/index.html For example, here's the ALU code: https://github.com/jart/cosmopolitan/blob/master/tool/build/...

Someone · on Dec 28, 2020

I wouldn’t call 3kb “pretty straightforward”, certainly not relative to the competition. The instruction length decoder for quite a few CPUs is

  return INSTRUCTION_SIZE;

I don’t know any other that has more than a few cases for immediate constants and jump addresses.

dnautics · on Dec 28, 2020

xoreaxeaxeax's videos about how to systematically parse the asm space of x86 really hit home to me how bad the encoding is.

https://www.youtube.com/watch?v=KrksBdWcZgQ

MaxBarraclough · on Dec 28, 2020

> doing so requires a JVM-like runtime such as wasmtime. Cosmopolitan proves that it's possible to just fix C instead

This doesn't sound like a fair comparison. WASM isn't just providing portability, it's also providing security and runtime safety. Cosmopolitan doesn't, it presumably requires you to trust the codebase, and doesn't protect you from C's undefined behaviour. Of course, WASM also imposes a considerable performance penalty, I presume the Cosmopolitan approach easily outperforms WASM.

> This is probably how Java would have been designed, if Intel's instruction set hadn't been encumbered by patents at the time.

To mirror the comment by Someone, I sincerely doubt this. Java bytecode is a very high level stack-based IR, nothing like a CPU ISA. They could easily have made it resemble a CPU ISA if they'd wanted to. The lesser-known GNU Lightning JIT engine takes this approach, for instance.

rini17 · on Dec 28, 2020

WASM only provides sandboxing. That is not the same as security nor it means runtime safety nor protection from undefined behavior.

MaxBarraclough · on Dec 28, 2020

> WASM only provides sandboxing. That is not the same as security

The relevant Wikipedia article is named Sandbox (computer security).

> nor it means runtime safety nor protection from undefined behavior

It puts stronger constraints on what mischief undefined behaviour can lead to, and guarantees that various runtime errors are handled with traps. [0] This isn't the same as a hard guarantee that execution will terminate whenever undefined behaviour is invoked, but it's still a step up.

[0] https://webassembly.org/docs/security/

ssokolow · on Dec 28, 2020

Speaking as someone who would love to see WebAssembly succeed as a cross-platform, cross-language way for me to write sandboxed plugins and CLI utilities, I do have to point out that, for out-of-browser use, WASM's MVP does regress various things:

* https://00f.net/2018/11/25/webassembly-doesnt-make-unsafe-la...

* https://www.usenix.org/system/files/sec20-lehmann.pdf

(eg. Under WebAssembly's memory model, dereferencing NULL=0 won't lead to a segfault.)

MaxBarraclough · on Dec 28, 2020

> Under WebAssembly's memory model, dereferencing NULL=0 won't lead to a segfault.

Thanks that's a curious one. It won't always lead to a segfault with a conventional compiler either, undefined behaviour being what it is. [0][1] Fortunately GCC can be asked to add such checks at runtime, [2] this approach could also be taken with WebAssembly.

[0] https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

[1] https://blog.regehr.org/archives/213

[2] https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.h...

ssokolow · on Dec 30, 2020

I was more referring to how it's standard practice on modern OSes to leave the zero page unmapped in a process's address space so that, if the compiler converts the NULL dereference into a dereference of address 0x0, it'll trigger a segfault.

pjmlp · on Dec 28, 2020

> We can in fact have first-class portable native binaries, which will only require JIT compilation if they're launched on a non-x86 machine.

This is nothing new, IBM and Unisys mainframes have been doing it since the 60's.

jart · on Dec 28, 2020

IBM System/360 and NexGen i586 were big inspirations for the design of Cosmopolitan.

thequux · on Dec 28, 2020

Are you familiar with IBM i (née AS/400)? There's a similar idea there, where programs are compiled to a virtual instruction set called TIMI then statically recompiled to whatever the machine's native instruction set is at the first load on a given machine. As a result, you can just drop a System/36 binary on a modern PowerPC system and not even notice that you've just crossed two changes of instruction set.

aledthemathguy · on Dec 28, 2020

excuse my ignorance on this, but this sounds amazing. game changing. yet it's being "served" here on ice. casually. what am I missing?

morelisp · on Dec 28, 2020

WORA's been promised for decades. Even if it's true no one will believe it (including me). And portability is not the only reason people choose Rust or Go over C.

That's before you get into whether you can aesthetically convince everyone that x86-64 is good, actually.

Still very interesting though, especially compared to the usual Show HN crap.

the-dude · on Dec 28, 2020

Your comment would be better without the last sentence.

ddalex · on Dec 29, 2020

Portability was a long desired attribute of computer software up until it turns out that it doesn't matter at all.

When software was deployed on floppy disks and there were 10k architectures going around, portability was critical to expand the market reach of any distributed software.

But in the modern age when the software runs "in the cloud" and there are dominant platforms out there, turns out that portability doesn't matter - nobody is changing platforms when they don't need to.

So this makes this achievement a pretty cool demo without any industry impact, since people don't distribute C binaries anymore.

The idea behind it and the technology developed is pretty cool :)