NHacker Next
login
▲Demonstrably Secure Software Supply Chains with Nixnixcademy.com
111 points by todsacerdoti 21 hours ago | 66 comments
Loading comments...
beardedwizard 20 hours ago [-]
The bummer about lots of supply chain work is that it does not address the attacks we see in the wild like xz where malicious code was added at the source, and attested all the way through.

There are gains to be had through these approaches, like inventory, but nobody has a good approach to stopping malicious code entering the ecosystem through the front door and attackers find this much easier than tampering with artifacts after the fact.

kuruczgy 18 hours ago [-]
Actually this is not quite true, in the xz hack part of the malicious code was in generated files only present in the release tarball.

When I personally package stuff using Nix, I go out of my way to build everything from source as much as possible. E.g. if some repo contains checked in generated files, I prefer to delete and regenerate them. It's nice that Nix makes adding extra build steps like this easy. I think most of the time the motivation for having generated files in repos (or release tarballs) is the limitations of various build systems.

throwawayqqq11 17 hours ago [-]
Your preference to compile your backdoors does not really fix the problem of malicious code supply.

I have this vague idea to fingerprint the relevant AST down to all syscalls and store it in a lock file to have a better chance of detection. But this isnt a true fix either.

kuruczgy 16 hours ago [-]
Yes you are right, what I am proposing is not a solution by itself, it's just a way to be reasonably confident that _if you audit the code_, that's going to be the actual logic running on your computer.

(I don't get the value of your AST checksumming idea over just checksumming the source text, which is what almost all distro packages do. I think the number of changes that change the code but not the AST are negligible. If the code (and AST) is changed, you have to audit the diff no matter what.)

The more interesting question that does not have a single good answer is how to do the auditing. In almost all cases right now the only metric you have is "how much you trust upstream", in very few cases is actually reading through all the code and changes viable. I like to look at how upstream does their auditing of changes, e.g. how they do code review and how clean is their VCS history (so that _if_ you discover something fishy in the code, there is a clean audit trail of where that piece of code came from).

vasco 8 hours ago [-]
> it's just a way to be reasonably confident that _if you audit the code_

Why do we pretend this is easy many times in conversation about dependencies? It's as if security bugs in dependencies were calling out at us, like a house inspector looking at a huge hole on the floor of the house. But it's not like that at all, most people would inspect 99.9% of CVEs and read the vulnerable code and accept it. As did the reviewers in the open-source project, who know that codebase much more than someone who's adding a dependency because they want to do X faster. And they missed it or the CVE wouldn't be there, but somehow a random dev looking at it for the first time will find it?

In fact, if to use dependencies I have to read and understand the code and validate it, the number of dependencies I'd use would go to zero. And many things I would be locked out of doing, because I'm too dumb to understand them, so I can't audit the code, which means I'm definitely too dumb to replicate the library myself.

Asking people to audit the code in hopes of finding a security bug is a big crapshoot. The industry needs better tools.

SOLAR_FIELDS 5 hours ago [-]
This makes perfect sense on a beefy super powered dev laptop with the disk space upgrade on an unsaturated symmetrical gig connection.

I’m only exaggerating a little bit here. Nix purism is for those who can afford the machines to utilize it. Doing the same on old hardware is so slow it’s basically nontenable

turboponyy 44 minutes ago [-]
Glossing over some details, the build artifact and build definition are equivalent in Nix. If you know the build definition, you can pull the artifact from the cache and be assured that you have the same result.
zeec123 5 hours ago [-]
The nix cache exists for a reason.
XorNot 14 hours ago [-]
The xz attack did hit nix though. The problem is no one is inspecting the source code. Which is still true with nix, because everyone writes auto bump scripts for their projects).

If anyone was serious about this issue, we'd see way more focus on code signing and trust systems which are transferable: i.e. GitHub has no provision to let anyone sign specific lines of a diff or a file to say "I am staking my reputation that I inspected this with my own eyeballs".

zelphirkalt 10 hours ago [-]
Is it really stacking ones reputation? Think about it: If everyone is doing it all the time, an overlooked something is quickly dismissed as a mistake that is bound to happen sooner or later. Person X is reviewing so much code and does such a great job usually, but now they overlooked that one thing. And they even admitted their mistake. Surely they are not bad.

I think it would quickly fade out. What are we going to do, if even some organization for professional code reviews signs off the code but after 5y in the business they make 1 mistake? Are we no longer going to trust them from that day on?

I think besides signing code, there need to be multiple pairs of eyeballs looking at it independently. And even then nothing is really safe. People get lazy all the time. Someone else surely has already properly reviewed this code. Let's just sign it and move on! Management is breathing down our necks and we gotta hit those KPI improvements ... besides, I gotta pick up the kids a bit earlier today ...

Don't let perfect be the enemy of good. There is surely some benefit, but one can probably never be 100% sure, unless one goes into mathematical proofs and understands them oneself.

0xDEAFBEAD 6 hours ago [-]
It's unlikely that multiple highly-regarded reviewers would all make the same mistake simultaneously (unless all their dev machines got compromised).

Ultimately it's about making the attacker's life difficult. You want to raise the cost of planting these vulnerabilities, so attackers can pull it off once every few decades, instead of once every few years.

jrockway 13 hours ago [-]
Yeah, the more I read through actual package definitions in nixpkgs, the more questions I have about selling this as some security thing. nixpkgs is very convenient, I'll give it that. But a lot of packages have A LOT of unreviewed (by upstream) patches applied to them at nix build time. This burned Debian once, so I expect it to burn nixpkgs someday too. It's inevitable.

I do think reproducible builds is important. It lets people that DO review the source code trust upstream binaries, which is often convenient. I made this work at my last job... if you "bazel build //oci/whatever-image" you end up with a docker manifest that has the same sha256 as what we pushed to Docker Hub. You can then read all the code and know that at least that's the code you're running in production. It's neat, but it's only one piece of the security puzzle.

p1necone 13 hours ago [-]
(Effectively) nobody will ever be serious about this issue unless it were somehow mandated for everyone. Anyone who was serious about it would take 3x as long to develop anything compared to their competitors, which is not a viable option.
0xDEAFBEAD 6 hours ago [-]
Yeah ultimately it's a public goods problem.

I wonder if a "dominant assurance contract" could solve this: https://foresight.org/summary/dominant-assurance-contracts-a...

transpute 14 hours ago [-]
> provision to let anyone sign specific lines of a diff

Good idea that should be implemented by git itself, for use by any software forge like github, gitlab, codeberg, etc.

https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work

spookie 8 hours ago [-]
This is why distros with long release cycles are better. Usually more time for eyeballs to parse things.

Take Debian for example, the commit never made it to stable.

0xDEAFBEAD 6 hours ago [-]
>When I personally package stuff using Nix, I go out of my way to build everything from source as much as possible. E.g. if some repo contains checked in generated files, I prefer to delete and regenerate them. It's nice that Nix makes adding extra build steps like this easy. I think most of the time the motivation for having generated files in repos (or release tarballs) is the limitations of various build systems.

You know what would be really sweet?

Imagine if every time a user opted to build themselves from source, a build report was by default generated and sent to a server alongside the resulting hashes etc. And a diff report gets printed to your console.

So not only are builds reproducible, they're continuously being reproduced and monitored around the world, in the background.

Even absent reproducibility, this could be a useful way to collect distribution data on various hashes, esp. in combination w/ system config info, to make targeted attacks more difficult.

yencabulator 20 hours ago [-]
I think a big part of the push is just being able to easily & conclusively answer "are we vulnerable or not" when a new attack is discovered. Exhaustive inventory already is huge.
tough 18 hours ago [-]
i read somewhere go has a great package for this that checks statically typed usage of the vuln specific functions not whole package deps
yencabulator 18 hours ago [-]
https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck
tough 17 hours ago [-]
ty ty exactly what I was thinking

does something like this exist for other languages like rust, python or js?

yencabulator 16 hours ago [-]
I don't think the Rust ecosystem has that at this time. They're annotating the vulnerabilities with affected functions, but as far as I know nobody's written the static analysis side of it.

https://github.com/rustsec/rustsec/issues/21

Python and JS might be so dynamic that such static analysis just isn't as useful.

dwattttt 16 hours ago [-]
For Rust, the advisory database cargo-audit uses (https://github.com/RustSec/advisory-db/) does track which functions are affected by a cve (if provided). I'm not sure if the tool uses them though.
XiZhao 19 hours ago [-]
I run a sw supply chain company (fossa.com) -- agree that there's a lot of low hanging gains like inventory still around. There is a shocking amount of very basic but invisible surface area that leads to downstream attack vectors.

From a company's PoV -- I think you'd have to just assume all 3rd party code is popped and install some kind of control step given that assumption. I like the idea of reviewing all 3rd party code as if its your own which is now possible with some scalable code review tools.

nyrikki 17 hours ago [-]
Those projects seem to devolve into a boil the ocean style projects and tend to be viewed as intractable and thus ignorable.

In the days everything was http I use to set a proxy variable and have the proxy save all downloaded assets to compair later, today I would probably blacklist the public CAs and do an intercept, just for the data of what is grabbing what.

Fedramp was defunded and is moving forward with a GOA style agile model. If you have the resources I would highly encourage you to participate in conversations.

The timelines are tight and they are trying to move fast, so look into their GitHub discussions and see if you can move it forward.

There is a chance to make real changes but they need feedback now.

https://github.com/FedRAMP

TZubiri 12 hours ago [-]
When you have so many dependencies that you need to create complex systems to manage and "secure" the dependencies. The problem is that you have too many dependencies, you are relying on too many volunteer work, and you are demanding too many features while paying for too little.

The professional solution is to PAY for your Operating System, and rely on them to secure it. Whether it be to Microsoft or to Red Hat. You KNOW it's the right thing to do, and this is overintellectualizing your need to have a gratis operating system and charge non-gratis prices to your clients in turn.

lmm 5 hours ago [-]
How does that solve the problem? Both Microsoft and IBM/Red Hat have shipped backdoored code in the past and will no doubt do so again. At most you might be able to sue them for a refund of what you paid them, at which point you're no better off than if you'd used a free system from the start.
12 hours ago [-]
ngangaga 13 hours ago [-]
I wish we would use terms like "verifiable" or "reproducible" rather than "secure", which is quite difficult to evaluate out of context of usage.
sollewitt 19 hours ago [-]
Valuably you also get demonstrable _insecure_ status - half the pain for our org of log4js was figuring out where it was in the stacks, and at which versions. This kind of accounting is really valuable when you're trying to figure out if and where you are affected.
niam 17 hours ago [-]
> it offers integrity and reproducibility like no other tool (btw. guix also exists)

This rubs me the wrong way. They acknowledge that alternative tools exist, but willfully use the wrong-er statement in pursuit of a vacuous marketing idiom.

Zambyte 13 hours ago [-]
To be fair, Guix was originally a Nix fork, so dismissing it was "just" a Nix fork doesn't seem that out there. I believe at this point all original Nix code has been replaced and Guix has completely diverged and is also awesome software, but I can see why someone would be inclined to say something like that if they were missing the full picture.
transpute 8 hours ago [-]
What's the best place to read about the history of Guix forking from Nix?
abhisek 9 hours ago [-]
This solves the problem of provenance and possibly build integrity. Given an artifact, it will allow identifying exact source from which each components are built.

But it still implicitly assumes that the source is secure and trusted. This is where a lot of problem happens when the source is compromised and malicious code is added.

seeknotfind 3 hours ago [-]
Nothing is demonstrably secure, only not demonstrably insecure. This is - hey our builds come with a bunch of resources you can use to try to prove they're insecure, but you probably can't - but it's an advertisement.
kristel100 3 hours ago [-]
If you can get over the learning curve, Nix is game-changing for supply chain reproducibility. The hard part isn’t the tooling—it’s convincing teams to shift mental models.
huimang 11 hours ago [-]
Is the header image ai generated? For shame. No point in reading any further.
Tractor8626 3 hours ago [-]
Classical Nih

"It's easier to do ThingA with Nix because you don't have to do ThingB!" (proceed to explain how to do ThingB but with Nix)

> You don't need to maintain your own forks and patchsets

... but you need to maintain your own nix packages and build scripts which is basically same amount of work

miloignis 1 hours ago [-]
Presumably, most to all of your dependencies will be in nixpkgs already, so you don't have to maintain them.
cyrnel 14 hours ago [-]
This seems to only address a few of the nine threats to the software supply chain, mainly "(D) External build parameters" and maybe the content-addressable storage addresses some of the distribution phase threats: https://slsa.dev/spec/v1.1/threats

There are still many other ways that a dependency can be exploited before or after the build phase.

jchw 14 hours ago [-]
Nix doesn't, can't, and will obviously never be able to audit your dependencies, but what it can do is give you a way in which you can audit everything byte-for-byte and end-to-end from input to output. In most architectures it is pretty hard to even get to this point because there is no rigorous tracking of dependencies and side-effects; e.g. if your builds are not sandboxed from the network, how can you be sure that the inputs you audited really account for all of the inputs to a build? Nix has a (complete) answer for that, among other things.
transpute 14 hours ago [-]
Debian reproducible builds, Guix, StageX and Yocto/OpenEmbedded have also worked in this area.
jchw 12 hours ago [-]
Reproducible builds are adjacent, but ultimately orthogonal, to what Nix does.

Reproducible builds provide strong evidence that a given set of inputs were used to produce a specific output. A lot of the work to be done here by each of these individual projects is beneficial to the entire ecosystem, including Nix. A lot of it is just fixing bugs and removing accidental non-determinism from builds. The main value this provides is that it allows a third-party to verify with relatively good certainty that binaries provided by some entity match the expected source code.

Nix provides a hermetic environment to build software with every single input from the bootstrap seed to the final build fully accounted for. Builds can't access the Internet or the host filesystem, they can only access files from their inputs. Impurities must go through the FODs, and the outputs from FODs have to match a prerecorded cryptographic hash, so they must be fully bit-reproducible or the build fails. When you have a reproducible Nix derivation, it is a strict superset of a reproducible package virtually everywhere else, because you can have a very high assurance that you can know and can audit individually every input to the derivation. This is useful for both auditing and reproducible builds.

Reproducible builds are important, but reproducible builds alone are not a panacea. They obviously don't tell you if your source code is free of defects, accidental or otherwise, and neither does Nix. Still, Nix does something basically nothing else does, by making the entire build process fully hermetic.

(Guix, being inspired by Nix, is, as far as I know, roughly the same, although Guix has put more effort into the bootstrap and package reproducibility. Still, Guix and Nix stand in a class of their own as far as usefulness to supply chain security go, even if they probably won't fit neatly into the compliance theater version of supply chain security.)

pveierland 11 hours ago [-]
This usage of `orthogonal` feels misleading. Nix is built to be a purely functional software distribution model where the outputs can be computed from the specified set of inputs. Although it is possible for derivations to not be reproducible, it seems incorrect to say that Nix is orthogonal to reproducible builds, as the point of the functional model is to nominally be able to get the same outputs given a set of inputs. Sure, Nix doesn't guarantee reproducibility of builds, but it certainly is designed to facilitate them (i.e. clearly correlated with reproducible builds and not being orthogonal to them).
jchw 10 hours ago [-]
Reproducible builds really are orthogonal to Nix, even if it seems like they're not. Nix was not built with reproducible builds in mind, and the problems it solves are ultimately independent even though they are related in some ways. The Nix PhD thesis by Eelco Dolstra, the founder of Nix, lays out the motivations for Nix very plainly. I decided I'd probably be better off quoting it directly than summarizing it poorly, so here goes:

From The Purely Functional Software Deployment Model, "1.3. Motivation"[1]:

    From the previous discussion of existing deployment systems it should be clear that they
    lack important features to support safe and efficient deployment. In particular, they have
    some or all of the following problems:

    • Dependency specifications are not validated, leading to incomplete deployment.
    • Dependency specifications are inexact (e.g., nominal).
    • It is not possible to deploy multiple versions or variants of a component side-by-side.
    • Components can interfere with each other.
    • It is not possible to roll back to previous configurations.
    • Upgrade actions are not atomic.
    • Applications must be monolithic, i.e., they must statically contain all their depen-
    dencies.
    • Deployment actions can only be performed by administrators, not by unprivileged
    users.
    • There is no link between binaries and the sources and build processes that built them.
    • The system supports either source deployment or binary deployment, but not both;
    or it supports both but in a non-unified way.
    • It is difficult to adapt components.
    • Component composition is manual.
    • The component framework is narrowly restricted to components written in a specific
    programming language or framework.
    • The system depends on non-portable techniques.

    The objective of the research described in this thesis is to develop a deployment system
    that does not have these problems.
Of course, part of the reason why reproducible builds was not considered for this is probably because it was simply not a hot topic at the time (had the term 'reproducible builds' even been coined yet?) and there were much bigger fish to fry with package management than reproducibility at that time, considering how relatively poorly packages were specified. Since then, "traditional" package managers and package repositories have put substantial work into cleaning up their package manifests and ensuring that they have accurate dependencies and other specifications such that today highly reproducible systems can be and have been built on top of them, limitations from the lack of hermetic guarantees notwithstanding.

Despite this, because Nix does guarantee bit-exact external inputs to a derivation, it does indeed make an excellent starting point for reproducible builds, but Nix itself as a tool is definitely orthogonal to reproducible builds, as it solves an entirely different problem that just happens to be related. You don't need the purity guarantees Nix gives you to get reproducible builds, and having those purity guarantees don't automatically give you reproducible builds (though as seen by Nixpkgs, it isn't uncommon for a build to coincidentally be reproducible just as a result of packaging it into a Nix derivation... just, not really specifically guaranteed by anything.)

[1]: https://edolstra.github.io/pubs/phd-thesis.pdf

pveierland 10 hours ago [-]
The entire point of purity within Nix is to have reproducibility by its nature: `In this model a binary component is uniquely defined by the declared inputs used to build the component` (section 1.5). It is not clear to me how this is an entirely different problem than that of reproducible builds. Yes, there are more problems to solve regarding achieving bit-for-bit reproducible builds that Nix does not fully solve alone, but reproducible software builds and deployments clearly are goals of Nix as a technology.
jchw 10 hours ago [-]
> The entire point of purity within Nix is to have reproducibility by its nature: `In this model a binary component is uniquely defined by the declared inputs used to build the component` (section 1.5). It is not clear to me how this is an entirely different problem than that of reproducible builds. Yes, there are more problems to solve regarding achieving bit-for-bit reproducible builds that Nix does not fully solve alone, but reproducible software builds and deployments clearly are goals of Nix as a technology.

Sorry, but you pretty much quoted what I would've quoted to refute your claim: it's exactly right that binary components are uniquely defined by their declared inputs, but note the subtle consequence that has: they are not defined at all by their outputs, only their declared inputs. (Making them be defined by their outputs is entirely possible FWIW, this is pretty much what content-addressed Nix paths are for.) This also applies recursively: the bit-exactness of external inputs are guaranteed by cryptographic hashes, but if you have any inputs that are, themselves, derivations, you can extremely trivially add impurities, because the system is just simply not designed to stop you from doing this. An example:

    stdenv.mkDerivation {
      name = "trivialimpurity";
      unpackPhase = "date>$out";
    }
It would be quite possible to make a system that is specifically designed to be resistant to this; consider that Nix goes great lengths to accomplish what it does, literally ripping apart ELF binaries to force dynamic symbols to be resolved in a static manner. There's no reason why you couldn't go further to force deeper reproducibility in a similar vein, it's just that Nix simply doesn't.

I think it's actually OK that Nix doesn't attempt to resolve this problem, because it is a pretty hairy one. Obviously you can't practically make the build environment 100% completely reproducible as it would be painfully slow and weird, so the best you could really probably do is intercept syscalls and try to make threading behavior deterministic (which would also make builds slow as balls, but less slow than running every build in unaccelerated qemu or something like that.) What Nix does do is solve the living shit out of the problem it was designed to solve, something you can feel very viscerally when you compare the state of Nixpkgs to the state of the AUR (a package repo I consider to be very good, but definitely can give some perspective about the problems Nix is designed to solve IMO, based on some of the problems I've run into with it.)

pveierland 9 hours ago [-]
Yes, Nix does not provide you a guarantee that there are no impurities in builds, which the extensional model specifically caters to, and where the intentional model improves upon this by providing a system bases on content hashes instead of input hashes.

However, Nix specially attempts to facilitate a pure software distribution model, which is why it does everything it does, e.g. forcing all input files to be copied to the store and providing mechanisms such as the pure evaluation mode which restricts access to the current system and time information. Yes, there are other ways to introduce impurities, but Nix tries in many ways to systematically remove these sources to increase the purity of the deployment process.

If the entire process of building an artifact is pure, then the artifact would be entirely reproducible, given that you have access to the same inputs. Yes, there are many ways to introduce impurity, however claiming that Nix as a purely functional software distribution model, where the central point is to achieve purity, is fully orthogonal to reproducible builds, seems incorrect.

xyzzy_plugh 9 hours ago [-]
Reproducibility is a second order effect. It's not a direct goal (at least originally) and as already discussed in this thread, more or less to death, not something that is required. It is very easy to write a derivation that cannot be reproduced. Sometimes it's even useful. Purity isn't really required at all. I wouldn't even say reproducibility is encouraged. It just sort of falls out of the system.

If Nix was worried about impurities that impact reproducible builds, they certainly wouldn't make live timestamps readily available in the pure environment, would they?

jchw 9 hours ago [-]
> Yes, Nix does not provide you a guarantee that there are no impurities in builds, which the extensional model specifically caters to, and where the intentional model improves upon this by providing a system bases on content hashes instead of input hashes.

Actually, the intensional model doesn't improve matters at all here, I was only pointing it out to demonstrate that the fact that binary components are addressed by their inputs doesn't really have anything to do with reproducibility. Of course the intensional model would mean that if you made the same build twice and it wasn't reproducible, then you'd get a different hash; however, that's not really an improvement over the current approach, which is to just build it twice and compare the output results. If anything, it just makes things more convoluted for reproducibility, due to the fact that you have to factor out self-references to check the hash.

The main advantage of the intensional model, as far as I know, is that it simplifies the trust model a bit. In the extensional model, you have to trust the substitutor, otherwise it could poison your Nix store. In the intensional model, derivations are addressed by their contents, so it's impossible to really poison the Nix store per-se, since you can definitely validate that the store path is correct for a content-addressed path.

Really though, it doesn't have a lot to do with reproducibility, and even in the work done in recent years I've not seen it mentioned at all in relation to reproducible builds, though I fully admit that it's very possible it's somehow useful and I just missed it.

> If the entire process of building an artifact is pure, then the artifact would be entirely reproducible, given that you have access to the same inputs.

That is true. Nix, though, explicitly only makes certain parts of the process pure, and the parts that it makes pure are specifically driven by the motivations outlined above. It is true that if you made the entire process completely pure, the build would be reproducible, and it is also true that Nix very intentionally does not try to do this, because it just simply wasn't in the list of problems Nix was solving at the time.

Likewise, though, you can still make a build reproducible without functional purity, which is exactly what has been done by various other reproducible build projects. They just happen to avoid impurities that would impact the result without any specific guarantees, which happens to be exactly what you have to do to make a reproducible build in Nix.

> Yes, there are many ways to introduce impurity, however claiming that Nix as a purely functional software distribution model, where the central point is to achieve purity, is fully orthogonal to reproducible builds, seems incorrect.

I don't know what "fully orthogonal" means relative to just "orthogonal". I am using "orthogonal" to mean "independent", i.e. what Nix solves is fully independent of reproducible builds. This follows because:

- It is possible to do reproducible builds without "purity" guarantees or sandboxing.

- It is possible to have builds that are not reproducible provided the purity guarantees of Nix.

The closest that Nix's purity comes to being related to reproducible builds is that it does, in fact, prevent some of the causes of unreproducible builds by enforcing the exact inputs, but that's why I consider it to be adjacent but ultimately orthogonal. If it's the word orthogonal that in particular is too strong, then maybe a better word would be "independent".

pveierland 9 hours ago [-]
Yes, I agree that reproducible builds is something that can be solved fully independently of Nix, and that Nix alone does not solve it, and that as such it is independent of Nix.

However, as the point of Nix is to achieve reproducible software deployments through a pure and functional description of the deployment, where it also provides mechanisms that systematically improves build reproducibility, I feel that orthogonal is misleading (as was my original disagreement) because reproducible builds correlate with Nix, and because achieving reproducible software deployment is a clear original goal of Nix, e.g. by seeking to remove implicit dependencies through mechanisms such as sandboxing of builds.

transpute 11 hours ago [-]
Thanks for the detailed explanation. Yocto has offline builds, but is missing host filesystem isolation.

Are you familiar with StageX, https://codeberg.org/stagex/stagex/#comparison? There's a comparison chart on that page which claims that Nix(OS?) is not fully reproducible. It would be useful to know which subset, if any, is not reproducible.

jchw 11 hours ago [-]
It is correct that NixOS is not fully reproducible.

Regarding Nix vs NixOS vs Nixpkgs:

- Nix is the name of the programming language (that Nix derivations are written in) and the package manager/build tool. (Though at times the term 'Nix' is just used to refer to the broader ecosystem including NixOS and Nixpkgs.)

- NixOS is an operating system that is built on top of Nix derivations.

- Nixpkgs is a giant repository of Nix derivations, including NixOS (NixOS and Nixpkgs used to be separate, but they were merged together in history.)

The reason why NixOS+Nixpkgs are not fully reproducible, despite all of the guarantees Nix gives you, is simply because there are derivations with non-determinism during the build process. An example of how this might play out is that you could wind up with the order in which operations complete in a parallel build process somehow getting encoded into the final package.

Unfortunately, I don't think there's funding or a ton of interest going into improving the reproducibility of NixOS at the moment, so progress towards squashing reproducibility issues has been slow for a while. You can see relatively up-to-date progress on getting the install media fully reproducible here:

https://reproducible.nixos.org/nixos-iso-minimal-runtime/

StageX also correctly points out that the Nix bootstrap is inferior to some of the more extreme reproducibility projects. The Nix bootstrap is fairly large, unfortunately. The Guix team has put a substantial amount of effort into minimizing the bootstrap seed and reproducibility of packages. The Nixpkgs bootstrap seed (for my machine, anyway) is currently 27 MiB. The GuixSD bootstrap seed is, I believe, 357 bytes, which is a stunning accomplishment.

StageX considers NixOS trust to be centralized and GuixSD trust to be distributed; this is likely because of the Hydra binary cache which Nix is typically configured to trust by default. You can turn off the Hydra cache to remove this centralized entity, at the cost of obviously needing to build almost everything from scratch. I'm not sure what "distributed" trust actually means here, versus "decentralized".

StageX uses OCI image building as a base. It also doesn't seem to talk about sandboxing anywhere, so it is presumed that StageX is using Dockerfile OCI builds as their only sandboxing, which still allows Internet access. Having Internet access during builds is convenient, but it makes it pretty hard to guarantee that all inputs are accounted for. Their Rust example is pretty interesting:

> RUN ["cargo", "add", "regex"]

There's nothing inherently wrong about this, but despite all of the effort to make the base StageX OCI images reproducible, if you were to build this exact same OCI image months apart, you would presumably be liable to get different results here: you could, for example, get an entirely different version of the regex crate. With Nix, if you make a derivation to build a Rust package, you have to account for the Cargo dependencies in the build, as Nix builds aren't allowed to access the Internet, with the exception of fixed-output derivations. While this doesn't result in Nix derivations being bit-for-bit reproducible, it does ensure that every (external) input is bit-for-bit identical across builds to the same exact derivation, something you can't really easily achieve without a custom build tool like Nix or Guix. If it were possible to sneak an impurity into a Nix build, it is likely a CVE. (There are some exceptions on macOS due to limitations in Darwin sandboxing, but on Linux I believe this holds true. None of the exceptions would make it possible to easily accidentally introduce impurities on macOS, though; you'd pretty much need to do it on purpose.)

Even that aside, StageX uses the same impressive 357 byte bootstrap seed as GuixSD, so it is pretty cool for what it does. It's just a bit lower in scope than Nixpkgs and GuixSD. Nixpkgs is probably the largest single software repository ever built with over 100,000 packages, all of which having to follow this schema of hermetic builds.

distrustryan 9 hours ago [-]
that Rust example is gonna bite us in the ass until the day i die, i need to remove it.

The Keyfork project is probably the best example of how an _actual_ Rust project is developed and shipped with stagex (disclaimer, I'm a maintainer of both). Actual Rust programs are built using the following steps:

1. Before building, the stagex tooling downloads and verifies a hash-locked version of the source package * Additionally, all dependencies for the package (compilers _and_ linked libraries) are verified to have been built. 2. Packages and pallets (collections of packages) are unpacked `FROM scratch` into a bare container 3. `cargo fetch --locked` is then invoked to fetch all the dependencies. 4. `RUN --network=none` is used when compiling the actual binary to ensure no network access happens _after_ the `cargo fetch` stage. Admittedly, it is not ideal to allow turning network access on and off throughout a build, but `--network=none` has helped us identify some odd cases where network access _does_ happen. 5. Once the binary is built, the binary is added on top of the base "filesystem" package, and is considered "done".

Unless some source file gets completely yoinked off the internet (which has happened, and we've had to "rebuild the world" because of it), every stagex package should be 100% bit-for-bit reproducible even if run several years down the line.

There may be some cases where we miss a datestamp or something similar, but hopefully as time goes on, we get the infrastructure to mock system times and throw other wrenches to test how reproducible these packages really are.

jchw 8 hours ago [-]
I probably should've mentioned that I don't actually have any familiarity with StageX, I did write that at some point but must've accidentally removed it from my reply while still working on it. Even so, I had a feeling the example wasn't a good example of how to actually use it properly, and I feel a little bad because I didn't really mean to critique StageX because of that particular issue, I just thought it was a good example of how Nix differs (Nix enforces purity, Dockerfile builds don't.) It seems like with StageX the goal is to ensure that the build is bit-for-bit reproducible as this would be a relatively good assurance that the inputs are also reproducible. On the other hand, it might be relatively hard to actually debug what went wrong in the more subtle cases where the inputs are not reproducible, since presumably the main artifact of this will be the output differing unexpectedly.

I'm definitely biased as a person who works on Nix stuff but I am not an absolutist when it comes to any of these things, based on what I'm reading about it I'd happily rely on StageX if I wanted reproducible OCI builds (and didn't feel like using Nix to do it, which has plenty of complexities on its own as nice as it can be.)

distrustryan 8 hours ago [-]
Oh and of course thanks for your opinions and feedback.
carlhjerpe 12 hours ago [-]
Yocto is the worst thing over ever worked with. Its just a pile of dirt on top of a pile of dirt
xyzzy_plugh 9 hours ago [-]
Yocto is significantly better than what proceeded it. I could fill a book on ways it has pushed the envelope for embedded development.

Is Yocto a steaming pile? Yes absolutely. But it remains a means to an end.

I'd struggle to accomplish with Nix what can be accomplished out of the box with Yocto. Now that being said I'd certainly try and use Nix over Yocto any day of the week.

I remain optimistic and that the two will converge and it will more or less not matter, though I'd much prefer Nix to consume embedded development than Yocto become Nix-pilled.

transpute 12 hours ago [-]
> pile of dirt on top of a pile of dirt

Infinite flexibility for reproducible commercial packaging of dirt permutations.

Packaging system pain exists at the border of chaos and simulated order.

There are useful concepts in Yocto but they were never formalized in academic papers, unlike some build systems for Haskell. There are packaging nuances encoded in bitbake recipes that will likely die there because they work "enough", instead of being further studied for long term lessons.

Given that shellcheck is written in Haskell, it might be an interesting academic exercise to write a Haskell replacement for bitbake, which converts bitbake recipes (shell+python) into something more maintainable.

gitroom 17 hours ago [-]
Hard agree on the pain of tracking all this - been there. Respect for the grind to actually lock this stuff down.
XorNot 15 hours ago [-]
This still doesn't fix the "trusting trust" attack: which Guix actually can, and which can bootstrap sideways to build other distros.

It also doesn't do anything which regular packaging systems don't (nix does have some interesting qualities, security ain't one of them): I.e. that big list of dependencies isn't automatic in any way, someone had to write them, which in turn makes it exactly the same as any other packaging systems build-deps.

christophilus 12 hours ago [-]
How does guix fix the trusting trust attack?

Aside: I wonder if AI code inspection and review could be put in place to detect xz-like malicious changes to the supply chain for major distros.

CBLT 11 hours ago [-]
https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...

Guix bootstraps (in 2023, no clue about now) from a 357-byte program. You audit the bytecode.

tucnak 17 hours ago [-]
The laborious extents to which people would go simply to not use Guix.
Zambyte 15 hours ago [-]
I also use Guix. Quickly skimming the article, I don't see anything that jumps out that Guix does all that different. What are you suggesting?
nicce 13 hours ago [-]
I like the philosophy of Guix, but it is too impractical by omitting non-free packages completely.
Zambyte 11 hours ago [-]
I believe a very high percentage of Guix users use nonguix[0] (including myself)

[0] https://gitlab.com/nonguix/nonguix