Sure, but the small compiler that you write in C can't compile rustc. So you write a new Rust compiler that uses much simpler Rust that the small compiler in C can compile. Then that new Rust compiler can compile rustc.
And since that new Rust compiler might not have much of an optimizer (if it even has one at all), then you recompile rustc with your just-compiled rustc.
Sometimes people like to do things just do do them, because the idea is cool. Sometimes an idea is cool because it has real world ramifications just for existing (trusting trust, supply chain attacks), though many people like to argue about whether or not this particular idea is Just Cool TM or Actually Useful TM.
I don’t think the article made any false pretenses either way- it seemed evident to me that at least some of the motivation was “because how? I want to do the how!” And that’s cool!
Also, I think the article pretty clearly redirects your question. Duh he could just cross compile rust. But the whole article, literally the entirety of it, is dedicated to exploring the chicken/egg problem of “if rustc is written in rust, what compiles rustc?”, and the rabbithole of similar questions that arise for other languages. The answer to that question being both “rustc of course”, and also “another compiler written in some other language.” The author wants to explore the process of making the compiler in [the/an] ‘other language’ because it’s Cool.
Just because something has already been done, does not mean there’s no worth in doing it- nor that there is only worth in doing it for educational and/or experiential purposes :)
Cross-compilation is orthogonal to bootstrapping, which is the major motivating factor for having something like what is described earlier in this thread: a small compiler written in C which can compile a subset of Rust, used to compile a larger, feature complete Rust compiler in that Rust subset -- versus what we have right now, which is a Rust compiler written in Rust which requires an existing Rust compiler binary, which means we have an unmitigated supply chain attack vector.
If you change your question to "why does anyone care about bootstrapping?", the answer would revolve around that aforementioned supply chain attack vector.
Perhaps you're comfortable with the lack of assurances that non-boostrapable builds entails (everyone has a different appetite for risk); some others aren't though, and so they have an interest in efforts to mitigate the security risks inherent in trusting a supply chain of opaque binaries.
Yes, they are because they want to target systems which explicitely disallow cross-compilation like Debian.
Yes, I think it's silly too but other people disagree and they are free to work on whatever they want.
Do I think it's a mostly pointless waste of time? Obviously, I do. Still, I guess there are worst ones.
Note that the Rust project does use cross-compilation for the ports it supports itself and considering the amount of time they use features only available in the current version of rustc in rustc, I guess it's safe to assume they share my opinion on the usefulness of keeping Rust bootstrappable.
* Bootstrapping a language on a new architecture using an existing architecture. With modern compilers this is usually done using cross compilation
* Bootstrapping a language on an architecture _without_ using another architecture
The latter is mostly done for theoretical purposes like reproducibility like reflections on trusting trust
Paring down a rust compiler in rust to only use a subset of rust features might not be a big lift. Then you only need to build a rust compiler (in C) that implements the features used by the pared-down rust compiler rather than the full language.
Pypy, for instance, implements RPython, which is a valid subset of Python. The compiler is written in RPython. The compiler code is limited on features, but it only needs to implement what RPython includes.
But that doesn't conform to the "Descent Principle" described in the article.
I haven't really been following Zig, but I still felt slightly disappointed when I learnt that they were just replacing a source-based bootstrapping compiler with a binary blob that someone generated and added to the source tree.
The thing that makes me uncomfortable with that approach is that if a certain kind of bug (or virus! [0]) is found in the compiler, it's possible that you have to fix the bug in multiple versions to rebootstrap, in case the bug (or virus!) manages to persist itself into the compilation output. The Dozer article talks about the eventual goal of removing all generated files from the rustc source tree, ie undoing what Zig recently decided to do.
If everything is reliably built from source, you can just fix any bugs by editing the current source files.
I think there is too much mysticism here in believing that the bootstrapping phases will offer any particular guarantees. Without essentially a formal proof that the output of the compiler is what you expect, you will have to manually inspect every aspect of every output phase of any bootstrapping process.
OK, so you decide to use Compcert C. You now have a proof that your object code is what your C code asked for. Do you have a formal proof of your C code? Have you proved that you have not allowed any surprises? If not, what is your Rust compiler? Junk piled on top of junk, from this standpoint.
On the other hand, you could have a verified WASM (or other VM) runner. That verified runner could run the output of a formally verified compiler (which Rustc is not). The trusted base is actually quite small if you had a fully specified language with a verified compiler. But you have to start with that trusted base, and something like a compiler written in C is not really enough to get you there.
> Without essentially a formal proof that the output of the compiler is what you expect, you will have to manually inspect every aspect of every output phase of any bootstrapping process.
And why would it be easier to manually inspect (prove correct) the output of every phase than to manually inspect (prove correct) the source code? The compiled code will often lose important information about code structure, how abstractions are used, include optimisations, etc.
I usually trust my ability to understand source code better than my ability to understand the compiled code.
That's not what I said. I've implied that it's hard to trust the output of some unknown compiler (eg, the "zig1.wasm" blob) and that it's easier to trust source code.
The Dozer article explains, under "The Descent Principle", how rustc will eventually be buildable using only source code [0] (other than a "512-byte binary seed" which implements a trivial hex interpreter). You still need to trust a computer to run everything on, though in theory it should be possible to gain trust by running it on multiple computers and checking that the result is the same (this is why any useful system bootstrapping project should also be reproducible [1]).
In this case it is portable, because the Zig compiler source tree includes an interpreter for the blob (WASM) in portable C.
It's not objectionable to have non-portable source code anyway. I think it's fine having architecture-specific assembly code, just as long as it's hand-written.
The problems arise when you're storing generated content in the source repository, because it becomes unclear how you're meant to understand and fix the generated content. In this case it seems like the way to fix it is by rerunning the compiler, but if running the compiler involves running this incorrect blob, it's not clear that running the compiler again will produce a correct blob.
I wonder if anyone is monitoring these commits in Zig to ensure that the blobs are actually generated genuinely, since if not it seems like an easy way for someone to inject a KTH (Ken Thompson Hack): https://github.com/ziglang/zig/commits/master/stage1/zig1.wa...