This really deserves more love. Who remembers Ken Thompson's "Reflections on Tru...

esjeon · on June 21, 2021

> I'm grateful to the nixos team for being beating a trail thru the jungle here. Retrofitting reproducibility onto a big software project that grew without it, is hard work.

Actually, it's Debian guys who pushed reproducible build hard in the early days. They upstreamed necessary changes and also spread the concept itself. This is a two-decade long community effort.

In turn, NixOS is mostly just wrapping those projects with their own tooling, literally a cherry on the top. NixOS is disproportionately credited here.

mikepurvis · on June 21, 2021

I think both efforts have been important and have benefitted each other. Nix has always had purity/reproducibility as tenets, but indeed it was Debian that got serious about it on a bit-for-bit basis, with changes to the compilers, tools like diffoscope, etc. The broader awareness and feasibility of reproducible builds then made it possible for Nix to finally realise the original design goal of a content-addressed rather than input-addressed store, where you don't need to actually sign your binary cache, but rather just sign a mapping between input hashes and content hashes.

Ericson2314 · on June 21, 2021

> where you don't need to actually sign your binary cache, but rather just sign a mapping between input hashes and content hashes.

Though you can and should sign the mapping!

mikepurvis · on June 21, 2021

Of course, yes— that was what I was saying. But the theory with content-addressability is that unlike a conventional distro where the binaries must all be built and then archived and distributed centrally, Nix could do things like age-out the cache and only archive the hashes, and a third party could later offer a rebuild-on-demand service where the binaries that come out of it are known to be identical to those which were originally signed. A similar guarantee is super useful when it comes to things like debug symbols.

theon144 · on June 21, 2021

By the way, here's the stats on Debian's herculean share of the efforts: https://wiki.debian.org/ReproducibleBuilds

raziel2p · on June 21, 2021

The ratio of reproducible to non-reproducible packages doesn't seem to have changed that much in the last 5 years.

kzrdude · on June 21, 2021

They have new challenges with new packages. In the last 5 years there entered a lot of rust packages for example, a new compiler to tackle reproducibility with (and not trivial, even if upstream has worked on it a lot).

stavros · on June 21, 2021

In my experience, rustc builds are reproducible if you build on the same path. They come out byte for byte identical.

kungito · on June 21, 2021

Yeah I remember there was some drama regarding build machine path leaking into the release binaries

kzrdude · on June 21, 2021

Aha.. don't all compilers behave the same way, with debug info?

I mean it's worthwhile to fix, but that behaviour seems so standard.

KirillPanov · on June 21, 2021

No, rust leaks the path to the source code on the build machine. This path likely does not even exist on the execution machine, so there's absolutely no good reason for this leakage. It is very nonstandard.

It is really, really annoying that the Rust team is not taking this problem seriously.

shawnz · on June 21, 2021

I don't think this is correct. Most compilers include the path to the source code on the build machine in the debug info, and it's a common problem for reproducible builds. This is not a rust-specific issue.

Obviously the binary can't contain paths from the execution machine because it doesn't know what the execution machine will be at compile time, and the source code isn't stored on the execution machine anyway. The point of including the source path in the debug info is for the developer to locate the code responsible if there's a crash.

See: https://reproducible-builds.org/docs/build-path/

colejohnson66 · on June 21, 2021

But is it only on debug builds? Or are release builds affected? Because if it’s the latter, that’s a big issue. But for the former, does it really matter?

bmwiedemann · on June 29, 2021

At least in openSUSE, we always build with gcc -g and then later strip debug symbols into separate debuginfo files. This leaves a unique hash in the original file and that makes them vary if the build path changes.

chriswarbo · on June 21, 2021

> This is a two-decade long community effort.

So is Nix/NixOS, which has reproducibility in mind from the start.

The earliest example I can find is "Nix: A Safe and Policy-Free System for Software Deployment" from 2004 ( https://www.usenix.org/legacy/event/lisa04/tech/full_papers/... ):

> Build farms are also important for release management - the production of software releases - which must be an automatic process to ensure reproducibility of releases, which is in turn important for software maintenance and support.

Eelco's thesis (from 2006) also has this as the first bullet-point in its conclusion:

> The purely functional deployment model implemented in Nix and the cryptographic hashing scheme of the Nix store in particular give us important features that are lacking in most deployment systems, such as complete dependencies, complete deployment, side-by-side deployment, atomic upgrades and rollbacks, transparent source/binary deployment and reproducibility (see Section 1.5).

zucker42 · on June 21, 2021

I don't think NixOS is getting too much credit. This is an accomplishment, even if it was built on the shoulders of giants.

catern · on June 21, 2021

That's somewhat uncharitable. patchelf, for example, is one tool developed by NixOS which is widely used for reproducible build efforts. (although I don't know concretely if Debian uses it today)

Foxboron · on June 21, 2021

patchelf is not really widely used for solving reproducible builds issues. It's made for rewriting RPATHs which is essential for NixOS, but not something you would be seeing in other distributions except for when someone need to work around poor upstream decisions.

dcposch · on June 21, 2021

Has a full linux image--something you can actually boot--existed as a reproducible build before today?

Gaelan · on June 21, 2021

I have to imagine it's been done, at least with some stripped-down kernel+busybox situation. Not sure, though.

emanlin · on June 21, 2021

Forgive my ignorance but isn’t that Slackware?

heisenzombie · on June 21, 2021

No, if I build Slackware on my computer and you build Slackware on yours; the binaries we end up with will not be bit-for-bit identical.

pxc · on June 22, 2021

tl;dr: Debian's work is very important here, but NixOS' reproducibility aims are more general than Debian's and began more than 8 years earlier

Despite the fact that Debian (as a project) has shouldered far more of the work with upstream projects to make bit-identical reproducibility possible at build time, Debian (as a distro) doesn't have a design that makes this kind of reproducibility as feasible, practical, or robust at the level of a whole system or disk image in the way that NixOS has achieved here. To quote the Debian project itself[0]:

> Reproducible builds of Debian as a whole is still not a reality, though individual reproducible builds of packages are possible and being done. So while we are making very good progress, it is a stretch to say that Debian is reproducible.

Beyond the fact that some packages still have issues upstream and the basic technical problem of versioning (i.e., apt fetching binaries from online archives in a stateful way) Debian additionally struggles with an extremely heterogeneous and manual process of acquiring and uploading source packages[1]. Debian doesn't even have the resources to construct a disk image where the version of every package is pinned, short of archiving all the binaries (which is how they do ‘reproducible’ ISO production now[2]). But pulling down all of the pre-built binaries for your distro isn't really ‘reproduction’ in the same sense as ‘reproduction’ in Debian's (package-level) reproducibility project.

Some points of comparison

  • NixOS always fixes the whole dependency tree
  • Debian requires a ‘snapshot’ repository to fix a dependency tree
  • most NixOS packages are updated through automatic tools and all the build recipes are stored under version control in one place
  • Debian packages can be updated any way that suits their maintainers, and the build recipes/rules can be stored anywhere (it's the maintainer's job to keep them in version control if they want, then upload them to Debian repositories as source packages)
  • Nix (transparently!) caches both build outputs and package sources, which means
    ◦ if the original source tarballs (e.g., on GitHub or SourceForge) are unavailable, Nix won't even notice that if it can pull them from the ‘binary’ cache
    ◦ if there is no cache of the build outputs, Nix will automatically fall back to fetching and unpacking the sources from the upstream mirror
  • Debian's technical and community relationships to upstream source code are both less robust
    ◦ Debian requires manual management (creating and uploading) of complete source code archives in their own format[1]
    ◦ sometimes Debian infrastructure can't even reproduce upstream source code from their own archives[3]
    ◦ if Debian's source archives are unavailable for a package, there is just no way to build it (since source package archives also contain the build instructions, dependency metadata, etc.)

Actually reproducing a NixOS image is less manual and can be done without relying on any online Nix/NixOS-specific infrastructure, and this is a real advancement over what's possible with binary distros like Debian. (Some other binary distros, like openSUSE) also have centralized version control for package definitions.)

One way to conceptualize the qualitative differences in reproducibility outlined above is by examining the ways that Nix strengthens Debian's definition of reproducibility[4], which reads:

> A build is reproducible if given the same source code, build environment and build instructions,

For Nix, the build instructions can simply encode all of what Debian calls the ‘relevant attributes of the build environment’:

> Relevant attributes of the build environment would usually include dependencies and their versions, build configuration flags and environment variables as far as they are used by the build system (eg. the locale). It is preferable to reduce this set of attributes.

And similarly, for NixOS, the acquisition of source code is folded into the build instructions and the ‘build environment’ (i.e., caches being available or GitHub not being down). So every Nix package that is reproducible at all is reproducible in a more general way than a reproducible Debian package.

And NixOS/Nix have had to do real work to make their systems reproducible in ways that Debian is not. Unlike much of Debian's work its benefits can't really be shared with distros of a different design— but the converse is sometimes true as well. For example, Debian's work on rooting out non-determinism in package post-install hooks[5] is useless (and unnecessary) for NixOS, Guix, and Distri, since their packages don't have post-install hooks.

There are also lots of little ways that issues Debian has worked on either reflect the relative weakness of this notion of reproducibility (e.g., ‘All relevant information about the build environment should either be defined as part of the development process or recorded during the build process.’[6] is a way of saying ‘the build environment should be reproducible or merely documented’) or overcoming challenges that systems designed with reproducibility in mind from the start simply don't face.

At the same time, the Reproducible Builds website refers to publications[7] by former Nix developers who directly cite the original Nix paper from 2004, whereas Debian's effort didn't begin in earnest until 2013.[8]

Compared to the Nix community, Debian is huge. And they've leveraged their collective expertise and considerable volunteer force to do a ton of work toward reproducible builds which has benefited reproducibility for everyone, including NixOS. Doubtless every remotely attentive member of the Nix community is grateful for that work, which a small community like Nix's could hardly have taken up on its own. But Nix has been attacking reproducibility issues at a different level (reproducing build environments, source code, and whole systems (in terms of behavior, if not bits)) in a meaningful way since long before Debian's reproducible builds effort got going. And some of those efforts have informed the wider reproducible builds effort, just like some of Debian's efforts have not been applicable to every project in the F/OSS community which is interested in reproducible builds.

So: let's praise Debian loudly and often for their work here and be clear that NixOS' reproducibility couldn't be where it is today without that work... but let's also be clear that Nix/NixOS absolutely has blazed some trails in the territory of reproducibility— a terrain that both communities are still mapping out together. :)

—

1: https://michael.stapelberg.ch/posts/2019-03-10-debian-windin...

2: https://wiki.debian.org/ReproducibleInstalls/LiveImages

3: https://www.preining.info/blog/2014/06/debian-pristine-tar-p...

4: https://reproducible-builds.org/docs/definition/

5: https://reproducible-builds.org/docs/system-images/

6: https://reproducible-builds.org/docs/recording/

7: https://reproducible-builds.org/docs/publications/

8: https://wiki.debian.org/ReproducibleBuilds/History#Kick-off

radicalcentrist · on June 20, 2021

Reproducibility is necessary, but unfortunately not sufficient, to stop a "Trusting Trust" attack. Nixpkgs still relies on a bootstrap tarball containing e.g. gcc and binutils, so theoretically such an attack could trace its lineage back to the original bootstrap tarball, if it was built with a compromised toolchain.

mjg59 · on June 20, 2021

Diverse double compilation should allow a demonstration that the toolchain is trustworthy.

Foxboron · on June 20, 2021

Indeed, and with the work done by Guix and the Reproducible Builds project we do have a real-world example of diverse double compilation which is not just a toy example utilizing the GNU Mes C compiler.

https://dwheeler.com/trusting-trust/#real-world

dane-pgp · on June 20, 2021

Projects like GNU Mes are part of the Bootstrappable Builds effort[0]. Another great achievement in that area is the live-bootstrap project, which has automated a build pipeline that goes from a minimal binary seed up to tinycc then gcc 4 and beyond.[1]

[0] https://www.bootstrappable.org/

[1] https://github.com/fosslinux/live-bootstrap/blob/master/part...

Foxboron · on June 20, 2021

I feel the need to point out that the "Bootstrappable Builds" project is a working group from a Reproducible Builds project which where interested in the next step beyond reproducing binaries. Obviously this project has seen most effort from Guix :)

The GNU Mes C experiment mentioned above was also conducted during the 2019 Reproducible Builds summit in Marrakesh.

https://reproducible-builds.org/events/Marrakesh2019/

naniwaduni · on June 21, 2021

In principle, diverse double-compiling merely increases the number of compilers the adversary needs to subvert. There are obvious practical concerns, of course, but frankly this raises the bar less than maintaining the backdoor across future versions of the same compiler did in the first place, since at least backdooring multiple contemporary compilers doesn't rely on guessing, well ahead of time, what change future people are going to make.

Critically, it shouldn't be taken as a demonstration that the toolchain is trustworthy unless you trust whoever's picking the compilers! This kind of ruins approaches based on having any particular outside organization certify certain compilers as "trusted".

XorNot · on June 21, 2021

There is an uphill effort here to actually do this. While theoretically a very informed adversary might get it right first time, human adversaries are unlikely to and their resources are large, but far from infinite.

Your entire effort is potentially brought down by someone making a change in a way you didn't expect and someone goes "huh, that's funny..."

GauntletWizard · on June 21, 2021

Quite frankly, I'm surprised that is hasn't come up multiple times in the course of getting to NixOS and etc. The attacks are easy to hide and hard to attribute.

User23 · on June 21, 2021

Really? How does that accomplish more than proving the build is a fixed point? An attacker may well be aware of the fixed point combinator after all.

Edit: I think that tone may have come off as snarky, but I meant it as an honest question. If any expert can answer I'd really appreciate it.

eru · on June 21, 2021

Fixed points don't come in here at all, unless you specifically want to talk about compiling compilers.

Diverse double compilation is useful for run-of-the mill programs, too.

chriswarbo · on June 21, 2021

Programs built by different compilers aren't generally binary comparable, e.g. we shouldn't expect empty output from `diff <(gcc run-of-the-mill.c) <(clang run-of-the-mill.c)`

However, the behaviour of programs built by different compilers should be the same. Run-of-the-mill programs could use this as part of a test suite, for example; but diverse double compilation goes a step further:

We build compiler A using several different compilers X, Y, Z; then use those binaries A-built-with-X, A-built-with-Y, A-built-with-Z to compile A. The binaries A-built-with-(A-built-with-X), A-built-with-(A-built-with-Y), A-built-with-(A-built-with-Z) should all be identical. Hence for 'fully countering trusting trust through diverse double-compiling', we must compile compilers https://dwheeler.com/trusting-trust/

smitty1e · on June 21, 2021

And how about that hardware and firmware microcode?

beermonster · on June 21, 2021

And also shipped firmware or binary blobs.

cookiengineer · on June 21, 2021

Actually, being able to build projects much easier from GitHub is the sole reason why I'm currently using Arch as my main OS.

Building a project is just a shell script with a couple of defined functions. Quite literally.

I really admire NixOS's philosophy of pushing the boundaries as a distro where everything, including configurations and modifications, can be done in a reproducible manner. They're basically trying to automate the review process down the line, which is absurdly complex as a challenge.

And given stability and desktop integrations improve over time, I really think that Nix has the potential to be the base for easily forkable distributions. Building a live/bootable distro will be so much easier, as everything is just a set of configuration files anyways.

takeda · on June 21, 2021

This is slightly different thing. Nix and NixOS are trying to solve multiple things, and that's what it might be a bit confusing.

Many people don't realize that, but if you get for example mentioned project from github and I do and we compile it on our machines we get a different file (it'll work the same but it won't be exactly the same).

Say we use the same dependencies, we still will get a different files, because maybe you used slightly different version of the compiler, or maybe those dependencies were compiled with different dependencies or compilers. Maybe the project while building inserts a date, or pulls some file. There are million ways that we would end up with different files.

The goal here is to get bit by bit identical files and it's like a Holy Grail in this area. NixOS just appears to achieved that and all packages that come with the system are now fully reproducible.

eru · on June 21, 2021

A rich source of non-reproducibility is non-determinism introduced by parallel building.

Preserving parallel execution, but arriving at deterministic outputs, is an interesting and ongoing challenge. With a rich mathematical structure, too.

londons_explore · on June 21, 2021

> and 6mo later every computer ... gets ransomwared.

I'm really surprised such an attack hasn't happened already. It seems so trivial for a determined attacker to take over an opensource project (plenty of very popular projects have just a single jaded maintainer).

The malicious compiler could inject an extra timed event into the main loop for the time the attack is scheduled to begin, but only if it's >3 hours away, which simply retrieves a URL and executes whatever is received.

Detecting this by chance is highly unlikely - because to find it, someone would have to have their clock set months ahead, be running the software for many days, and be monitoring the network.

That code is probably only a few hundred bytes, so it probably won't be noticed in any disassembly, and is only executed once, so probably won't show up in debugging sessions or cpu profiling.

It just baffles me that this hasn't been done already!

schelling42 · on June 21, 2021

How do you know it hasn't been done already? (with a more silent payload than ransomware) /s

Tabular-Iceberg · on June 21, 2021

What does the /s mean in this context?

Zetaphor · on June 21, 2021

/s is internet parlance to show that the message should be read in a sarcastic tone.

Tabular-Iceberg · on June 21, 2021

Yes, but what confused me is that as far as I can tell we really don’t know that it hasn’t been done before.

gavinhoward · on June 21, 2021

Not GP, but I think it indicates sarcasm?

Gravyness · on June 21, 2021

> I'm really surprised such an attack hasn't happened already.

If you count npm packages this happened quite a few times already. People (who don't understand security very well) seems to be migrating to python now.

zamadatix · on June 20, 2021

Unless you are going to be the equivalent of a full time maintainer doing code review for every piece of software you use you need to trust other software maintainers reproducible builds or not. Considering this is Linux and not even Linus can deeply review every change in just the kernel anymore that philosophy can't apply to meaningfully large software like Nixos.

jnxx · on June 20, 2021

That's too black-and-white. Being able to reproduce stuff makes some kind of attacks entirely uninteresting because malicious changes can be traced back. Which is what many types of attackers do not want. Debian, or the Linux kernel, for example, are not fool-proof, but both are in practice quite safe to work with.

zamadatix · on June 20, 2021

Who are you going to trace it back to if not the maintainer anyways? If the delivery method then why is the delivery of the source from the maintainer inherently any safer?

jnxx · on June 20, 2021

No, it is not always the maintainer. Imagine you download a binary software package via HTTPS. In theory, the integrity of the download is protected by the server certificate. However, it is possible that certificates get hacked, get stolen, or that nation states force CAs to give out back doors. In that case, your download could have been changed on the fly with arbitrary alterations. Reproducible builds make it possible to detect such changes.

zamadatix · on June 20, 2021

Same as when you download the source instead of the binary and see it reproducibly builds the backdoored binary. And at this point we're back to "Build from source. This will always be a deeply niche thing to do. It's slow, inconvenient, and inaccessible except to nerds." anyways.

It's not that reproducible builds provide 0 value it's that they don't truly solve the trust problem as initially stated. They also have non-security value to boot which is often understated compared to the security value IMO.

bigiain · on June 20, 2021

I guess reproducible builds solve some of the problems in the same way TLS/SSL solves some of the problems.

Most of the world is happy enough with the soft guarantee of: “This is _probably_ your bank’s real website. Unless a nation state is misusing their control over state owned certificate authorities, or GlobalSign or LetsEncrypt or whoever has been p0wned.”

Expecting binary black and white solutions to trust problems isn’t all that useful, in my opinion. Often providing 50% more “value” in trust compared to the status quo is extremely valuable in the bigger picture.

zamadatix · on June 21, 2021

Reproducible builds solve many security problems for sure but but the problems it solves in no way help you if the maintainer is not alturistic or bad at security as originally stated. It helps tell you if the maintainers toolchain wasn't compromised and it does it AFTER the payload is delivered and you built your own payload not made by the maintainer anyways. It doesn't even tell you the transport/hosting wasn't compromised unless you can somehow get a copy of the source used to compile not made by the maintainer directly as the transport/hosting for the source they maintain could be as well.

Solving that singular attack vector in the delivery chain does nothing for solving the need to trust the altruism and self interest of maintainers. A good thing™? Absolutely, along with the other non security benefits, but has nothing to do with needing to trust maintainers or be in the niche that reviews source code when automatic updates come along as originally sold.

pabs3 · on June 21, 2021

There are other solutions to the problem of trusting maintainers; namely incremental distributed code review. The Rust folks are working on that:

https://github.com/crev-dev/

bigiain · on June 22, 2021

> but the problems it solves in no way help you if the maintainer is not alturistic or bad at security as originally stated.

That same edgewise applies to your bank too. Pinned TLS certs or pre shared keys might help against "BadGuys(tm)", but you're still screwed if your bank decides to keep your money. (s/bank/online crypto wallet/ for real world examples there...)

squiggleblaz · on June 20, 2021

The question isn't whether they're perfect, nor is it whether they prevent anything. But it does help a person who suspects something is up rule certain things in and out, which increases the chances that the weak link can be found and eliminated.

If you have a fair suspicion that something is up and you discover that when you compile reproduceable-package you get a different output than when you download a prebuilt reproduceable-package, you've now got something to work with.

Your observation that they don't truly solve the trust problem is true. But it's somehow not relevant. It is better to be better off.

eru · on June 21, 2021

Reproducible builds still help a lot with security. For example, they let you shift build latency around.

Eg suppose you have a software package X, available both as a binary and in source.

With reproducible builds, you can start distributing the binary to your fleet of computers, while at the same time you are kicking off the build process yourself.

If the result of your own build is the same as the binary you got, you can give the command to start using it. (Otherwise, you quarantine the downloaded binary, and ring some alarm bells.)

Similarly, you can re-build some random sample of packages locally, just to double-check, and report.

If most debian users were to do something like that, any tempering with the debian repositories would be quickly detected.

(Having a few malicious users wouldn't hurt this strategy much, they can only insert noise in the system, but not give you false answers that you trust.)

robocat · on June 21, 2021

> and inaccessible except to nerds.

So was most every part of computer hardware and software initially - this is just another milestone in that journey.

eptcyka · on June 20, 2021

Even if the original attack happened upstream, if the upstreamed piece of software was pinned via git, then it'd be trivial to bisect the upstream project to find the culprit.

dragonsky67 · on June 20, 2021

This is great if you are looking at attributing blame. Not so great if you are trying to prevent all the worlds computers getting owned....

I'd imagine that if I were looking at causing world wide chaos, I'd love nothing better than getting into the tool chain in a way that I could later on utilise on a wide spread basis.

At that point I would have achieved my aims and if that means I've burnt a few people along the way, so be it, I'm a bad guy, the damage has been done, the objective met.

Taek · on June 20, 2021

You can't solve this problem without having a full history of code to inspect (unless you are decompiling), reproducibility is the first step and bootstrapability is the second step. Then we refine the toolchains and review processes to ensure high impact code is properly scrutinized.

What we can't do is throw our hands up and say anyone who compromises the toolchain deep enough is just allowed to win. It will happen at some point if we don't put the right barriers in place.

It's the first step of a long journey, but it is a step we should be taking.

donio · on June 21, 2021

https://github.com/fosslinux/live-bootstrap is another approach, bootstrapping from a tiny binary seed that you could hand-assemble and type in as hex. But it doesn't address the dependency on the underling OS being trustworthy.

bmwiedemann · on June 30, 2021

There is stage0 by Jeremiah Orians that is designed to be able to bootstrap on hardware that can be built from transistors. Currently it mostly runs in a small VM process that is somewhat harder to subvert.

radicalcentrist · on June 20, 2021

Reproducibility is what allows you to rely on other maintainers' reviews. Without reproducibility, you can't be certain that what you're running has been audited at all.

It's true that no single person can audit their entire dependency tree. But many eyes make all bugs shallow.

IgorPartola · on June 21, 2021

No. I can review 0.1% of the code and verify that it compiles correctly and then let another 999 people review their own portion. It only takes one person to find a bit of malicious code, we don’t all need to review every single line.

xvector · on June 21, 2021

> It only takes one person to find a bit of malicious code, we don’t all need to review every single line.

This is just objectively wrong. I have worked on projects at FAANG where entire teams did not spot critical security issues during review.

You are very unlikely to spot an issue with just one pair of eyes. You need many if you want any hope of catching bugdoors.

IgorPartola · on June 21, 2021

You are misunderstanding what I am saying. I am saying that it only takes one person who finds a vulnerability to disclose it, to a first approximation. Realistically it’s probably closer to 2-3 since the first might be working for the NSA, the CCP, etc. I am making no arguments about what amount of effort it takes to find a vulnerability, just talking about how not every single user of a piece of code needs to verify it.

remram · on June 21, 2021

That only works if you coordinate. With even more people, you can pick randomly and be relatively sure you've read it all, but I posit that 1) you don't pick randomly, you pick a part that is accessible or interesting to you (and therefore probably others) and 2) reading code locally is not sufficient to find bugs or backdoors in the whole.

pabs3 · on June 21, 2021

The crev folks are working on a co-ordination system for incremental distributed code review:

https://github.com/crev-dev/

IgorPartola · on June 21, 2021

I actually wonder if it’s possible to write code at such a macro level as to obfuscate, say, a keylogger in a huge codebase such that reviewing just a single module/unit would not reveal that something bad is going on.

eru · on June 21, 2021

Depends on how complicated the project itself is. A simple structure with the bare minimum of side-effects (think, functional programming) would make this effort harder.

For something like C, all bets are off: http://www.underhanded-c.org/ or https://en.wikipedia.org/wiki/Underhanded_C_Contest

remram · on June 21, 2021

Crev is a great idea, unfortunately it is only really available for Rust right now.

pabs3 · on June 21, 2021

I noticed there is a git-crev project, might that be useful for other languages? Also there is pip-crev for Python.

User23 · on June 21, 2021

> Who remembers Ken Thompson's "Reflections on Trusting Trust"?

> The norm today is auto-updating, pre-built software.

This is a little bit misleading. The actual paper[1] explains that you can't even trust source available code.

[1] https://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-tho...

0xbadcafebee · on June 20, 2021

Supply chain attacks are definitely important to deal with, but defense-in-depth saves us in the end. Even if a postgres container is backdoored, if the admins put postgres by itself in a network with no ingress or egress except the webserver querying it, an attack on the database itself would be very difficult. If on the other hand, the database is run on untrusted networks, and sensitive data kept on it... yeah, they're boned.

dcposch · on June 21, 2021

In the case of a supply chain attack, you don't even need ingress or egress.

Say the posgres binary or image is set to encrypt the data on a certain date. Then it asks you to pay X ZEC to a shielded address to get your decryption key. This would work even if the actual database was airgapped.

0xbadcafebee · on June 21, 2021

That's true, I didn't think of that! D:

marcosdumay · on June 21, 2021

> I predict that everyone's imagination on this topic will expand once there's a big enough incident in the news.

How the Solarwinds incident, with about every large software vendor being silently compromised for years does not qualify?

Because it does not, people's imagination is as closed as it always was.

yeowMeng · on June 21, 2021

Solarwinds is closed source so the choice to build from source is not really an option.

pabs3 · on June 21, 2021

They could have distributed the code to a few select parties for the purposes of doing a build and nothing more.

marcosdumay · on June 21, 2021

Specifically Microsoft did distribute the code to several parties for the purposes of auditing. But they didn't allow building it.

initplus · on June 20, 2021

Building from source doesn't have to be inaccessible, if the build tooling around it is strong. Modern compiled languages like Go (or modern toolchains on legacy languages like vcpkg) have a convention of building everything possible from source.

So at least for software libraries building from source is definitely viable. Fro end user applications it's another story though, doubt we will ever be at a point where building your own browser from source makes sense...

bigiain · on June 20, 2021

Building from source also doesn’t buy you very much, if you haven’t inspected/audited the source.

The upthread hypothetical of a compromised package manager equally applies to a compromised source repo.

_Maybe _ you always check the hashes? _Maybe_ you always get the hashes from a different place to the code? _Maybe_ the hypothetical attacker couldn’t replace both the code you download and the hash you use to check it?

(And as Ken pointed out decades ago, maybe the attacker didn’t fuck with your compiler so you had lost before you even started.)

garmaine · on June 20, 2021

Binary reproducible builds are still pretty inaccessible though.

Accujack · on June 21, 2021

>The norm today is auto-updating, pre-built software.

Only if you define "norm" as what's prevalent in consumer electronics and phones. Certainly, if you go by numbers, it's more common than anything else.

That's not due to choice, though, it's because of the desires of corporations for ever more extensive control of their revenue streams.

tester756 · on June 21, 2021

>There are only two ways around this:

>- Build from source. This will always be a deeply niche thing to do. It's slow, inconvenient, and inaccessible except to nerds.

if you trust the compiler :)

powerbook5300CS · on June 20, 2021

Why does building from source help? It’s not like people are reading every line of the source before building it anyway 99.99% of the time.

xvector · on June 21, 2021

If the package maintainer's build pipeline is compromised (eg. Solarwinds), you are unlikely to be affected if you build from reviewed source yourself.

pjmlp · on June 21, 2021

Except hardly anyone reviews a single line of code.

squiggleblaz · on June 21, 2021

So? We are trying to protect against a malicious interloper damaging the machine of a trusted and trustworthy partner.

You are bringing up red herrings about trusted partners being malicious and untrustworthy.

Do you genuinely believe we should only solve a problem if it leads to a perfect outcome?

pjmlp · on June 21, 2021

I genuinely believe to spend resources on issues where ROI is positive.

So far exploits on FOSS kind of prove the point not everyone is using Gentoo, reading every line of code on their emerged packakges, let alone similar computing models.

Now if we are speaking about driving the whole industry to where security bugs, caused by using languages like C that cannot save us from code reviews unless done by ISO C language lawyers and compiler experts in UB optimizations, are heavily punished like construction companies are for a fallen bridge, then that would be interesting.

therealjumbo · on June 21, 2021

> I genuinely believe to spend resources on issues where ROI is positive.

How are you measuring the ROI of security efforts inside an OSS distro like debian or nixos? The effort in such orgs is freely given, so nobody knows how much it costs. And how would you calculate the return on attacks that have been prevented? Even if an attack wasn't prevented you don't know how much it cost, and you might not even know if it happened (or if it happened due to a lapse in debian.)

>So far exploits on FOSS kind of prove the point not everyone is using Gentoo, reading every line of code on their emerged packakges, let alone similar computing models.

Reproducible builds is attempting to mitigate a very specific type of attack, not all attacks in general. That is, it focuses on a specific threat model and countering that, nothing else. It's not a cure for cancer either.

>Now if we are speaking about driving the whole industry to where security bugs, caused by using languages like C that cannot save us from code reviews unless done by ISO C language lawyers and compiler experts in UB optimizations, are heavily punished like construction companies are for a fallen bridge, then that would be interesting.

This is just a word salad of red herrings. Different people can work on different stuff at the same time.

nix23 · on June 22, 2021

>Who remembers Ken Thompson's "Reflections on Trusting Trust"?

That was his Turing Award ;) not Unix as one would assume.

staticassertion · on June 21, 2021

> Reproducible builds are way more important than is currently widely appreciated.

Why? How will this help with the problems you're talking about?

I can't come up with a single benefit to security from reproducible builds. It seems nice for operational reasons and performance reasons though.

pilif · on June 21, 2021

> I can't come up with a single benefit to security from reproducible builds.

It is a means to allow to detect a compromised supply chain. If people rebuilding a distro cannot get the same hash as the distro shipping from the distributor, then likely the distributors infrastructure has been compromised

staticassertion · on June 21, 2021

How does this work in practice? The distro is owned, so where are you getting the hash from? I mean, specifically, what does the attacker have control of and how does a repeatable build help me stop them.

pilif · on June 21, 2021

The idea is that multiple independent builders build the same distro. You expect all of them to have the same final hash.

This doesn't help against the sources being owned, but it helps about build machines being owned.

Accountability for source integrity is in theory provided by the source control system. Accountability for the build machine integrity can be provided by reproducible builds.

To answer your specific questions: The attacker has access to the distro's build servers and is packaging and shipping altered binaries that do not correspond to the sources but instead contain added malware.

Reproducible builds allow third parties to also build binaries from the same sources and once multiple third parties achieve consensus about the build output, it becomes apparent that the distro's build infrastructure could be compromised.

staticassertion · on June 21, 2021

OK so a build machine is owned and we have a sort of consensus for trusted builders, and if there's a consensus mismatch we know something's up.

I suppose that's reasonable. Sounds like reproducible builds is a big step towards that, though clearly this requires quite a lot of infrastructure support beyond just that.

jnxx · on June 20, 2021

This is great! The one fly in the ointment, pardon, is that Nix is a bit lax about trusting proprietary and binary-only stuff. It would be great if there were a FLOSS-only core system for NixOS which would be fully transparent.

quarantine · on June 20, 2021

Nix/Nixpkgs blocks unfree packages by default, so I presume it would be relatively easy to disable packages with the `unFree` attribute.

jnxx · on June 20, 2021

I totally believe it is possible, it is perhaps more of a cultural thing.

eptcyka · on June 20, 2021

It's the pragmatic thing. I wouldn't use nixOS if I wasn't able to use it on a 16 core modern desktop. I don't think there's a performant and 100% FLOSS compatible computer that wouldn't make me want to gouge my eyes out with a rusty spoon when building stuff for ARM.

zamadatix · on June 20, 2021

Talos has 44 core/176 thread server options which can take 2 TBs of DDR4 that are FSF certified. The board firmware is also open and has reproducible builds.

tadfisher · on June 20, 2021

That is way more expensive than a 16-core desktop, though. Workstations are a class above consumer-grade desktops and that's reflected in the price.

zamadatix · on June 21, 2021

Talos have as low as 8 core desktop options as well this is just an example of how far you can take FLOSS hardware. Not that I consider a 16 core x86 desktop "consumer-grade" in the first place (speaking as a 5950X owner).

Probably not fit for replacing Grandma's budget PC but then again grandma probably isn't worried about the ARM cross compile performance of their machine running NixOS either.

tadfisher · on June 22, 2021

Okay, now I'm interested :)

(I am worried about the ARM cross compile performance of my machine running NixOS)

eptcyka · on June 20, 2021

Thanks, I was legitimately unaware of this option. That does smash my argument, but I'm not likely to be using a system like that anytime soon due to cost concerns mostly.

kaba0 · on June 20, 2021

And it’s not just hardware, there is a useful limit on purity of licenses. In many cases only proprietary programs can do the work at all, or orders of magnitudes better.

rejectedandsad · on June 20, 2021

> It would be great if there were a FLOSS-only core system for NixOS

Might be wrong but isn't this part of the premise for Guix/GuixSD?

Filligree · on June 21, 2021

And it's good that it exists, I guess?

But it can't do any of the things I bought my computer to do, so it's of limited value to me.

swiley · on June 20, 2021

>self interest (companies do not want to attack their own users).

Anyone who has bought an Android phone in the past 5 years knows that's not true.

hsbauauvhabzb · on June 20, 2021

I don’t have the resources to audit every component of my system. I favour enterprise distros who audit code which ends up in their repos and avoid pip, npm, etc. but there are some glaring trade offs on both productivity and scalability.

The problem is unmaintainability, I can’t imagine it’d be easier for medium sized teams where security isn’t a priority, either.

1vuio0pswjnm7 · on June 20, 2021

"- Build from source. This will always be a deeply niche thing to do. It's slow, inconvenient, and inaccessible except to nerds."

I prefer compiling from source to binary packages. For me it is neither slow, incovenient nor inaccessible.

Only with larger, more complex programs does compiling from source become a PITA.

The "solution" I take is to prefer smaller, less complex programs over larger, more complex ones.

If I cannot compile a program from source relatively quickly and easily, I do not voluntarily choose it as a program to use daily and depend on.

For compiling OS, I use NetBSD so perhaps I am spoiled because it is relatively easy to compile.

That said, I understand the value of reproducible builds and appreciate the work being done on such projects.

kixiQu · on June 20, 2021

"except to nerds" was conversationally phrased shorthand for "except to people with rarefied technical skills".

kaba0 · on June 20, 2021

You don’t use a browser or an office suite? Because those are a pain in the ass to compile (in terms of time).

1vuio0pswjnm7 · on June 21, 2021

Not just time, IME. Also 1. highly resource intensive, e.g., cannot compile on small form factor computers (easier for me to compile a kernel than a "modern" browser) and 2. brittle.

vore · on June 21, 2021

I think using NetBSD might put you in the nerd camp ;-)

brigandish · on June 20, 2021

Unfortunately, it's easy to break a lot of builds by things such as deciding not to install to /usr/local, or by building on a Mac. Pushing publishers to practices that aid reproducible builds would help both sides.

I'd love to try building NetBSD, btw, I must try that!

zucker42 · on June 21, 2021

Don't take this the wrong way, but I think you qualify as a nerd. :)